Sunday, December 28, 2014

Literacy in India Sixty Years After Independence - Some Interesting Observations

India is home for about 1/6th of the world’s population and as such, all measures of quality of life namely life expectancy, infant mortality, literacy, per capita income, nutrition, etc. are of paramount importance in today’s globalized economy. A lot has been said and written about illiteracy in India and since Independence, a number of initiatives have been launched by the central as well as various state governments to deal with this malice. National Literacy Mission, Sarva Siksha Abhiyaan, the Midday Meal Program in Tamil Nadu, 1 Rupee grant per day for school going children in Bihar are some of these initiatives. Though India has taken significant strides in eradication of illiteracy, we all know that much remains to be done.

In this blog, I will present some interesting facts borne by the data on literacy rates in India. I happened to download the 2011 Census data from the website www.censusindia.gov.in and thought of performing some analysis on literacy rates. Personally, I did not hope to unravel much from this initiative but, as I delved more into the data, a number of very interesting facts stood out.

If we try to analyze the literacy rates and poverty levels in various states, a very interesting pattern emerges. It is well known that poverty results in increased levels of school dropouts because families in poor economic conditions rather prefer that their children help in augmenting income. Consequently, school dropouts result in illiteracy and illiteracy definitely does not aid in poverty alleviation. Thus the vicious cycle of illiteracy and poverty continues. Let’s analyze how serious is this vicious cycle in India. The plot of illiteracy versus percentage of people below the poverty line shows a clear relationship between illiteracy and poverty (see Chart A below). As you can clearly observe, higher levels of poverty implies higher levels of illiteracy. There are three distinct clusters in this chart. Such clustering is performed in a scientific manner using a Statistical method called K-Means. The first cluster (Cluster 1) is one with poverty levels and illiteracy that are higher than the norm. Some of the states in this cluster namely, Bihar, Uttar Pradesh, Madhya Pradesh and Odisha are populous states. In fact, the states in this cluster represent a whopping 42% of India’s population. So in summary one can say that amongst 42% of India’s population, higher levels of poverty is driving illiteracy.


Chart A: Relationship between Illiteracy and Poverty
 
Next, the analysis of male-female literacy rate differential yields some disturbing facts. In a perfect world, the male-female literacy rate differential should be independent of the overall literacy rate. We know that gender bias exists in India and, I thought of analyzing how pronounced is this gender bias. The following chart depicts the relationship between male-female literacy rate differential (Y-axis) versus literacy rates (X-axis) in various states.


Chart B: Gender gap in literacy rates versus overall literacy
 
The conclusions from this chart are as follows:
  • There is a strong relationship between illiteracy and gender gap in literacy rates. In Statistical terminology, we use a technique known as Linear Regression to understand relationships between two data series. Here the orange line (also known as the Regression Line) depicts the relationship between male-female literacy rate differential and overall literacy rates in various states. As is evident from this chart, the higher the literacy rate, the lower the gender gap in literacy and vice versa. This obviously points to the fact that to some extent, higher levels of illiteracy amongst the masses is because a larger proportion of women are illiterate. In Statistics, we measure the strength of the relationship between two data series with a metric known as R-squared. R-squared could range from 0 to 1 and, the higher the value of R-squared to 1 the stronger the relationship. In this case with an R-squared of 0.45, I would classify the relationship as moderately strong.
  • There are some obvious outliers. States like Meghalaya, Nagaland, Mizoram, Punjab and Assam are far below the orange line. In other words, this implies that in these states, the gender gap in illiteracy is relatively low as compared to its peers with the same level of literacy rates. In fact Punjab and Haryana, the two neighboring states depict a contrasting syndrome. They both have overall literacy rates that are similar – in Punjab it is 67% versus 65% in Haryana. Nonetheless, in Punjab the male-female differential in literacy rates is much less (8%) as compared to Haryana (15%).
  • Three outliers above the orange line clearly stand out. These are Rajasthan, Dadra and Nagar Haveli and, Daman and Diu. In these states the male-female differential in literacy rates is quite pronounced. Rajasthan poses a major challenge for literacy of women. In this state, the gender bias in literacy rates is the highest in the country. I thought of analyzing the gender gap in literacy for the age group 10 to 24 years (see Chart C below). This age group represents children and young adults who should be in middle school, high school or college. As is evident, among the major states in the union, Rajasthan is an anomaly with significantly higher male-female literacy rate differential. Even amongst the urban population, this difference is quite significant as compared to the other major states.

Chart C: Gender gap in literacy rates versus overall literacy in age group 10-24
 
Some other interesting observations can be made if we try to analyze the literacy rates by age groups for the general category of population, Scheduled Castes and Scheduled Tribes (see Chart D below). The gap in literacy rates for Scheduled Castes and Scheduled Tribes exist in all age groups and gets more pronounced with higher age groups. This is somewhat expected because we know that a larger proportion of elderly people an illiterate as compared to the younger generation. What is troubling though is that even for the younger population, literacy amongst the Scheduled Castes and Scheduled Tribes is lagging behind the general population quite considerably. Take for instance the age group 25-29. This is the age when most people start their career and unfortunately, in this very age group, the literacy rates amongst the Scheduled Castes is 10% lower than the general category. For the Scheduled Tribes, the corresponding gap is 20%. Let me point out that in India, the Scheduled Castes and Scheduled Tribes represent 25% of the population and therefore, such differentials in literacy rates are definitely not helping the cause of inclusive growth.
 
 
Chart D: Literacy rates by various age groups for the general category, Scheduled Castes and Scheduled Tribes

Now, other than poverty, let me point out another side effect of illiteracy i.e., infant mortality (see Chart E below). Based on the Statistical measure of R-squared, the relationship between infant mortality and illiteracy can be classified as substantially strong. It is quite evident from this chart that higher levels of literacy helps is curbing infant mortality rates as seen in the states of Kerala, Goa, and the Union Territories of Andaman & Nicobar Islands and Lakshadweep.


Chart E: Relationship between infant mortality and literacy rates

In summary, I would state that in order to improve literacy rates in India, we should be focusing on the following:
  • Stress upon improving literacy in the impoverished states like Bihar, Uttar Pradesh, Madhya Pradesh, Odisha, etc. If special incentives have to be given to attract and retain students from poor in schools and colleges, then so be it. We obviously cannot afford to have 42% of the population falling behind in literacy levels as compared to the rest of the country.
  • Initiatives are required especially in states like Rajasthan to narrow the gender gap in literacy levels. Again, half the population cannot be disadvantaged in terms of education if we are to promote inclusive growth. It is also conceivable that higher levels of literacy amongst women will reduce infant mortality on account of better family planning, immunization, hygiene, sanitation, etc.
  • Similarly, the literacy rates amongst the Scheduled Castes and Scheduled Tribes needs to be augmented significantly. Again, a quarter of the population in a developing country simply cannot fall behind in education.

Friday, December 26, 2014

Analytically Yours


I finally decided to write my own blogs. A number of friends, colleagues and well-wishers have been requesting me to write blogs related to the application of analytics. We all know that in today’s world, analytics and its application in business has become a hot topic. Obviously, this has been prompted by the movement that is popularly known as ‘Big Data’. I have spent quite a few years working in the field of analytics and, I consider myself extremely fortunate to have been able to work on some very interesting and real life problems in areas like investment banking, risk management, web based marketing, predictive healthcare, etc. My association with analytics though has inculcated a deep rooted belief that advanced analytics should not just reside in the Ivory Tower of Statisticians but, it should be applied to the common person’s life.

I am of the firm opinion that in today’s world, the solution for the most complex problems require the application of technology, analytics and human resources. Let’s take the example of drop outs in high school in developing or under developed countries. There are lots of reasons for high dropout rates namely, necessity to substantiate family income, lack of infrastructure, inability to cope up with coarse load, lack of the appreciation for the value of education, and so on and so forth. Technology can definitely solve the infrastructure issue to a large extent through web based learning. Massive Open Online Course (MOOC) is already there and I expect the MOOC movement to gain rapid momentum in the coming years. Predictive analytics can add the missing touch in preventing dropouts by identifying those who are most likely to dropout, by clustering students based on their abilities and interests and proposing MOOC courses, analyzing performance of each student at a micro-level and proposing corrective action to the teachers, etc.

My blogs will focus on the application of analytics in all walks of life, not necessarily just business. The blogs that I will write here will focus on addressing the pressing needs of our society today namely – poverty alleviation, eradication of illiteracy, affordable healthcare, sanitation and drinking water, and safety and security of mankind. I will also be inclined to write about various political and social topics with pertinent facts highlighted by the underlying data. I am a keen follower of elections across the globe and particularly in the US, UK and India, and I do intend to write about electoral politics from time to time. I am not a commentator on political or social topics and would only like to focus on the findings based on thorough analysis of the data. Nonetheless, I expect these blogs to stir some thoughtful debates on topics of political and social importance.

At times I do intend to use sophisticated techniques for analyzing the data. But, my attempt will always be to explain these techniques and the predicted outcomes in layman’s terms. After all, if analytics were to be used in daily life, we should be able to articulate it in a way that the common man understands and appreciates. At times, I also intend to write on topics that will be less serious in nature for example, application of analytics in sports or in matching your tastes to movies or books. Hopefully, blogs on such topics will add the necessary spice and keep my audience interested.

I am in this for the long haul and I expect to get moral support and constructive feedback from the readers. Lastly to my audience - If you would like me to analyze and write on a particular topic, please suggest and I promise that I will seriously examine the feasibility of doing so.

Analytically yours,
Partha Sen