Gritty Graphs - The story that is not told

Last week was pretty tight. In addition to travelling over 5000 km during this time, I actively balanced placements and family time. So I guess a break was the ‘need of the week’ (I know, this was not my best joke). Without any further ado, let’s now get on with this week’s edition of the gritty graphs. In today’s episode we will discuss something fundamental - What data to present?

Before we begin to answer that question, let’s revisit something interesting - the best story known to man. The story of a young David beating the mighty Goliath. The facts and ideas conveyed by the story are very extremely beautiful. It is a classic ‘underdog’ tale, emblematic of hope, courage, and the triumph of the seemingly weak over the ostensibly strong. It inspires individuals to face their own 'giants,' regardless of the odds. We’ve heard it as kids and have always spired to be the underdog who rises against the might of a giant. Well, at the surface, this looks all nice. But then, there is also a lot of detail that’s hidden from us.

For example, little did we know that Goliath, may have had a severe visual impairment (yep. These are things you can research if you choose the academic’s life! You’re welcome!. Also, read this for more details: https://www.ima.org.il/FilesUploadPublic/IMAJ/0/63/31622.pdf). Considering you know the new information, do you think the story sounds as great as it did just a few minutes ago?

Think about the refreshed headline - Young boy beats a blind man!

Would that sound ok?

Not so much, right?

The beauty of any story is not just in the facts it talks about. But also in the facts that it does not talk about.

The same can be said about Statistical charts as well. Each story tells a particular story, it also keeps another interesting story away from us. A good chart reader can read the chart and ask the next set of questions that would help him or her paint a more comprehensive story in their mind.

Let’s begin with a simple question and build our way towards the solution now.

Let’s take the example of the placement report of any top-tier B School in India. You would typically see figures such as the highest CTC, the mean CTC, and the media CTC. Some of them are bold and mention the number of students that appeared for placements in the year.

But them, in stats class, we learn that the best way to communicate information about a distribution is by plotting a histogram. However, no B School actually presents a histogram of their placement records. Now let’s ask the most important question - Why?

Indeed, the oft-used metrics such as highest, mean, and median CTC offer a snapshot, but they can obscure as much as they reveal. They give the impression of success and affluence, drawing attention to the top earners and averages that might look impressive at first glance. However, this approach misses out on the distribution's depth and the variances that are a fundamental aspect of the data. Revealing this information can tell a more nuanced story of the graduates' economic outcomes.

Consider this: a histogram can provide insights into the full range of salaries, highlighting the peaks, valleys, and overall spread. It reveals the frequency of various salary ranges, offering a clearer picture of what a typical student might expect rather than just the best-case scenarios. Therefore, when presented with a distribution that clusters at lower salary ranges, it might challenge the narrative of ‘universal high earnings’ that institutes want to present. More often than not, the data contains many outliers, both on the low and high ends. These pieces of important information are lost when only central tendencies are reported.

So, why don't B Schools showcase this?

In addition to painting a really rosy picture, presenting only limited data on an institution can attract more applicants, thereby enhancing its prestige and perceived value. There's also the matter of simplicity – mean and median figures are straightforward and easily digestible by a broad audience without a statistical background. Histograms, while informative, require a bit more effort to understand and interpret. Therefore, institutes often choose not to present histograms or box plots.

There may also be other reasons why institutes do not showcase this data. In my view, institutes might want to avoid deterring potential students with the reality that not everyone lands a top-tier salary or that there's a significant variance in what graduates earn.

This selective data presentation does a disservice to prospective students who deserve a complete and honest picture of their potential future. Personally, it causes me a lot of sadness to see Instagram handles post mean and median CTC numbers of various IIMs. In reality, no two IIMs are made the same, and there is a lot of variation between the numbers. For example, it is commonly understood that engineers get better salaries while non-engineers get a slightly lower starting salary. Now, there are institutes that have almost 100% engineers and then there are those that have hardly 10% engineers. Does it make sense to compare the two just on the basis of means? Given that students are quick to make decisions and are easily steered by these factoids that are presented to them, it is somewhat understandable that institutes do not reveal this information readily. While transparency in data would empower them to make informed decisions based on a realistic understanding of potential outcomes, it may also prompt institutions to address directly the reasons behind the broader salary distributions, potentially leading to curriculum improvements and better career support services.

Addressing why histograms or detailed distributions are not commonly used also invites a broader discussion about transparency and accountability in education. As the demand for data-driven decisions increases, there's a growing need for educational institutions to adopt more comprehensive and transparent reporting practices. Such a shift would benefit students and enhance the schools' credibility and trustworthiness.

But then, like the story of David and Goliath, let’s go one step deeper and ask. Are the audiences ready for such levels of transparency? Could the exposition of such critical data put an organization at a significant disadvantage especially when the competition to get good students inside is really high.

While the push for transparency seems morally and ethically sound, it operates in a complex ecosystem where perceptions significantly influence decisions. Suppose one institution starts presenting all its data candidly, including histograms of salary distributions. In that case, it might initially face backlash or a dip in applications, especially if its competitors continue showcasing only the most favorable statistics. Students, driven by aspirations and societal benchmarks of success, might misconstrue transparency as a lack of success or prestige.

This brings us to a critical juncture in the conversation about readiness and reception. Are prospective students, in their quest for the best possible futures, ready to embrace this level of candor? It requires a paradigm shift in measuring and understanding success, moving away from top-line numbers to a more holistic understanding of outcomes. This shift isn't easy; it demands a mature audience that can appreciate the value of transparency and is equipped to interpret complex data comprehensively.

Furthermore, the competitive landscape of B Schools cannot be ignored. Schools are under immense pressure to present themselves in the best light in a world where rankings, reputation, and perceived prestige play outsized roles in decision-making. If one school opts for radical transparency while others do not, it risks being perceived as less successful, regardless of the educational quality or the actual value it provides to its students.

In this context, collective action could be a solution. If all institutions agreed to a standard of transparency, no single school would be at a disadvantage. However, achieving this level of cooperation is challenging in a competitive environment where each institution is vying for a limited pool of candidates.

Ultimately, the discussion about data presentation in B Schools reflects broader societal questions about truth, representation, and the role of education. Just like the story of David and Goliath encourages us to look beyond the surface and question deeper truths, this debate urges us to consider what we value in education and how we define and measure success. As we evolve in our understanding and approach to data, perhaps we can foster a more informed, discerning, and fairer educational landscape.