Statistical Study of Risk Factors of End Stage Renal Failure in Peshawar, Pakistan

BACKGROUND: End Stage Renal Failure (ESRD) is the last stage of the chronic renal failure in which kidneys become completely fail to function. AIM: The basic aim of this study was to discover the important risk factors of ESRD and to construct a model for prediction of the ESRD patients in various hospitals of Peshawar, Pakistan. MATERIAL AND METHODS: The data were collected from the patients of renal diseases from three major hospitals of Peshawar. Brown method was used to obtain initial model, then backward elimination logistic regression analysis was performed to find the significant variables (risk factors). The response variable (ESRD) in this study is binary; therefore, logistic regression analysis is used to identify the significant variables. A Statistical Package GLIM and SPSS were used for fitting the model and for finding the significant variables. RESULTS: The backward elimination procedure selects predictor variables diabetic, hypertension, glomerulonephritis and heredity, for males. Thus, these variables are the main causes of ESRD. For females, the predictor variables selected are hypertension & (Diabetic*Hypertension), which means that hypertension and hypertensive diabetic are significant causes of ESRD. CONCLUSION: Our main conclusion from this analysis is that diabetic, hypertension and glomerulonephritis are the significant risk factors of ESRD.


Introduction
Statistical science consists of ever growing collection techniques used to make wise decisions in the face of uncertainty. Statistical methods are integral part of research in almost all fields of study. The branch of statistics which study the data about the living organism and is very much an indispensable tool in the field of medical research and its development is called Biostatistics. In carrying out medical research programs, one cannot deny biostatistics to be the integral part of it. Statistical methods are used in finding significant risk factor of various diseases and also applicable to psychology, taxonomy etc. "In statistics, the mathematical modeling plays very important role, specially probability theory which is used to narrate the variation in the data and as a source of drawing conclusions about them" [1].
The generalized linear models are extensively used in the social science and especially in medical research, introduced by Nelder and Wedderburn 2]. It is beneficial to use generalized linear model for evaluating the significant risk factors of end stage renal disease.
The main function of kidney is to filter the blood purified and making the waste product to go out of the body. Kidney also adjusts the blood pressure by assessment of electrolytes in the body and stimulating the production of red blood cells. Kidney diseases are divided into two groups: the first one is acute renal disease and the second one is chronic renal disease. In medical science the term kidney disease is known as renal disease.

End Stage Renal Diseases (ESRD)
ESRD is the last stage of the chronic renal failure in which kidneys become completely fail to function. In this stage the kidney stop its functions to remove the impurities and control electrolytes. The symptoms of ESRD comprise less urine output, swelling of legs, face, nausea and vomiting [3].
A cohort study was carried out by Hsu and colleagues in 2009. By using the statistical technique of survival analysis, it was found that the independent risk factors of ESRD were sex, race, anemia and heredity [4]. A research of 600 people was carried out by using the Odds Ratio analysis and it was found that diabetes has strong association with ESRD. The systolic and diastolic hypertension accelerates diabetic nephropathy. Smoking also increases the risk of ESRD in diabetes patients [5].
In a cohort study, it was found that kidney failure was more common in the United States likely; reflecting an aging population with increased rates of hypertension, diabetes and obesity and that the Cardio Vascular Disease (CVD) was a major risk factor of ESRD [6]. The researcher also found an interesting risk factor, body mass index which is associated with ESRD and it has a j-shape relationship with ESRD [6].
Kamal and Sakhuja [7] proposed that the key source of ESRD in Pakistan and India are hypertension, glomerulonephritis and diabetes. One third of the ESRD patients are affected by glomerulonephritis, while diabetic nephropathy accounts for about one fourth of all patients in India. The average age of ESRD in the subcontinent is lesser (42 years) than the western countries (61 years).
It was found that the leading risk factors of ESRD are glomerulonephritis, diabetes, hypertension, and polycystic kidney disease [8]. In a study of kidney patients, the researchers found that diabetes was the major cause of ESRD and were 44.603 % of the total ESRD patients, whereas hypertension was 26.587 % and glomerulonephritis was 12.19 % of the total ESRD patients. Further, 16 % of the total ESRD patients were affected by Systemic Lupus Erythematous, Obstructive Nephropathy etc. Some other risk factors (such as age, sex, race and family history) were also caused for the growth of ESRD [8].
The basic aim of this study was to discover the important risk factors of ESRD and to construct a model for prediction of the ESRD patients in various hospitals of Peshawar, Pakistan.

Materials and Method
The circumstance challenging the researcher in medical studies is the enlightening of significant predictive aspects that disturb various outcomes of interest. The researcher might, for example, be interested in finding in what way response to a stroke patient is influenced by diabetes and age. The statistical device for such a status quo is some form of regression model. Regression analysis is a statistical method used to examine the affiliation flanked by dependent and different independent variables. In several cases, the persistence of the analysis is prediction, in others it is simply to determine vital predictive variable. Different kinds of regression methods are available, analogous to the different natures of explained variable. Simple linear regression and multiple regressions, for example, are applicable for continuous explained variables such as blood pressure or weight. Logistic regression is appropriate to states concerning binary variables, for example, existence or nonexistence of ESRD, alive or dead at the end of a clinical trial, gets better or not from a disease.
In this study, we examine the relationship of ESRD (a binary response variable) with various risk factors: age, sex, diabetes, hypertension, glomerulonephritis, hepatitis, heredity, SLE nephritis, polycystic kidney disease, heart disease and obstructive nephropathy. So the precise instrument is the logistic regression modeling.

Data Collection
To identify the important risk factors of ESRD, the data were collected from three main hospitals of Peshawar namely Hayatabad Medical Complex, Khyber Teaching Hospital and Lady Reading Hospital. A total of 407 cases were examined for the presence or absence of ESRD. The risk factors for this study were diabetes, hypertension, glomerulonephritis, polycystic kidney diseases, obstructive nephropathy, myeloma, SLE nephritis, hepatitis, family history, drug usage, heart problem, age and sex. The information was collected for patients that belong to uncontrolled group (ESRD cases) and also to control group (non-ESRD cases). There were 407 patients in which 244 (60%) were males and 163 (40%) were females. The average age of male patients was 43.38 years and females patients average age was 42.4 years. Out of 407 patients 167 had ESRD and 240 had no ESRD. Of 244 male's patients, 108 patients had ESRD and 136 had no ESRD. Of 163 female patients 59 had ESRD and 104 had no ESRD, which indicate that males are more exposed to ESRD as compared to females.

Logistic Regression Modeling
Berkson [9] suggested the logistic regression model at first and told how we fit the model by iteratively weighted least squares. In medical studies, the logistic regression is commonly used; subsequently numerous studies contain two-category response or explained variables. An early example of the use of this method, in the situation of a study of 12-year prevalence of coronary heart disease, was discussed [10]. Seven different factors measured at initial examination were studied for their effects on the incidence of the disease; they were age (in years), ECG (normal or abnormal), systolic blood pressure (mm Hg), cigarettes smoked per day relative weight, hemoglobin (g/dl), and serum cholesterol (mg/dl). Age and cholesterol level were found to be the top significant predictive variables [10].
The relationship of a subject's health status with age, smoking behavior, sex and socioeconomic group was studied [11]. The main purpose of the analysis was to conclude whether chronic sickness was more prevalent in smokers after modification for gender, age and socioeconomic group [11]. An outstanding behavior of logistic regression is that of Cox and Snell [12].
One form of the multiple linear regression models is: where Y is the response variable and is equal to 1, if patients has an ESRD, otherwise 0 (for non-ESRD patients). The E(Y) in equation (1) is consider to be, any real value except we restrain the values of β 0, β 1 , Let us denote this probability by P(x), reflecting its dependence on the independent variables X 1 + X 2 + . . .
------ (2) For dichotomous explained variable Y, the equation (2) is known as linear probability model. When observations on Y are independent, this model is said to be Generalized Linear Model with identity link function. It is better to study models by using a curvilinear link flank by X & P due to structural problem related with the equation (2). After alteration, it is better to assume the linear model for the dependent variable. Logistic or Logit is the common way for transformation of equation (2), which can be expressed mathematically as: where 0 < P < 1. Now the required model is: The linear probability model in equation (2) can now be re-written as: For a particular X, the equation (5) becomes, Generalized Linear Model is the link function for the logistic regression. For this model, the odds of making response are: ln ( ) = β0X0 + β1X1 ----------------- (8) If one unit increases in X, then the odds increase e  1 times. There is linear association in the Log odds for which suitable way is the Logit that is, the transformation of log odds. If the sampling is prospective or retrospective, the effect can be estimated by this model. The odds are the influence of the logistic model and the ratio of odds of one observation to the odds of other is known as odds ratio.

Analysis
The data were collected from the patients of renal diseases from three major hospitals of To choose initial model, we used Brown method, for the backward elimination procedure. The suitable statistical technique to choose the risk factors of ESRD is logistic regression technique because the response variable, ESRD is a binary variable.  Table 1 show the different steps used, with the coefficients, Standard Error (S.E), Wald Statistic and significance of each variable selected in the 6 th step of backward elimination procedure. Table 2

Model for Males
We take start from the model obtained by Brown method for backward elimination process for males, using SPSS. Table 3   elimination procedure. We obtained the final model in 5 th step contains only main effect (D, H, G, Hd), which is highly significant since all calculated p-values are less than 0.05. The regression coefficients, standard error, Wald-statistic, odds ratio and confidence interval of odds ratio for the various terms selected in the last step of backward elimination are given in Table 4. Thus, the fitted model is: Logit (p^) = -4.304 + (3.245) D+ (2.163) H + (2.164) G + (4.328) Hd

Model for Females
From Brown method we get initial model [D, H, G, Hd, (D*G), (D*H), (H*G), (D*H*G)] and by using backward elimination procedure, we obtained the best fitted model with variables (H, D*H) in the 7 th step. Table 5 contains the variables H, (D*H), which are highly significant because calculated p-values are less than 0.05. The table consists of regression coefficients, Wald-statistic, standard error and odd ratios. Table 6 gives the information about various statistics of fitted model.

Discussion
ESRD is the last stage of the chronic renal failure in which kidneys become completely fail to function. In this stage, the kidney stops to remove the impurities and control electrolytes. The symptoms of ESRD comprise less urine output, swelling of legs and face, along with nausea vomiting. The basic aim of this study was to determine the important risk factors of ESRD and to construct a model for prediction of the ESRD patients. In this study, the response variable ESRD is a binary variable (equal to 1 if patient is ESRD and 0 if patient is non-ESRD), logistics regression analysis is, therefore, appropriate technique. At first we selected the initial model by   Next, we fitted logistic regression model for males and females data-set separately. For males, the predictor variables in the fitted model were D, H, G and Hd. This shows that diabetes, hypertension, glomerulonephritis and heredity were the main causes of ESRD. On the other hand, for the females, the predictor variables were H and (D*H), which shows that hypertension and diabetic hypertensive are significant factors of ESRD. Our main conclusion from this analysis is that diabetes, hypertension and glomerulonephritis are the significant risk factors of ESRD and these factors play significant role in End Stage Renal Failure.
Our findings are in line with Kamal and Sakhuja [7] who suggested that hypertension, glomerulonephritis and diabetes are the main causes of ESRD in Pakistan and India.