1 Million More Aellas

Feb 24, 2023

I wish more people felt empowered to playfully explore data and publish their findings.

Unfortunately, there are numerous barriers that make this daunting. People who are not already well-versed in data science are often told they must use intimidating data tools. While there are plenty of low-code and no-code platforms that provide an alternative, it's difficult to know which actually make it easy to do things correctly.

That’s because a novice data explorer has a harsh deterrent: the fear that they are not following best practices; their analysis and findings will be subject to criticism from the credentialed experts. These concerns are not unfounded, as there are indeed many experts who can be quite publicly critical, sometimes solely because the person lacks credentials. As a professional data novice myself, I feel this deeply. Fortunately, I have nerds like Nathaniel Bechhofer and Andrew Hornstra to check me and tell me when I’m making a rookie mistake. But not everyone has a team of nerds.

A couple of weeks ago, Lex Fridman had a conversation with Aella, a sex researcher who has conducted the largest human sexuality survey to date. Following the podcast, I saw tweet after tweet from people with prestigious affiliations and degrees in their bio criticizing Aella’s work – it's not peer reviewed, it uses non-representative samples, its methodology is careless. The worst were the credentialist critiques dismissing Aella’s research simply because she didn’t choose to go to graduate school for years to get a doctorate.

Aella’s response to the criticism was, of course, golden. In a tweet thread, she likened data cultivation and exploration to the construction of a chair. She prompted readers to imagine a room of people who lacked chairs, and suggested that even if you lacked carpentry skills, you could still build a functional chair to meet the need. Of course, there might be some mistakes, but your chair could still be used. Despite your humility and disclaimer that you are not a carpenter, some carpenters may criticize your chair simply because you are not part of their guild, without ever considering the merits of your methods.

To this, I say the more chairs for people to sit upon and choose between, the better. After all, the chairs left unchosen demonstrate that the construction could be improved. Further, the discourse about the chair's construction can be beneficial for the next person who tries to build a chair, novices and experts alike.

Science should get input from people of all backgrounds – not just those in the guild. Doing so would lead to an increased democratization of knowledge as well as new insights and perspectives that can be critiqued and, when appropriate, adopted. In addition, fostering a culture of inclusion and openness in scientific research would improve public trust in science. When members of the public are able to participate in scientific research and contribute their knowledge and insights, they are more likely to understand the value and limitations of scientific findings and to appreciate the role that science plays in their life.

That’s why I think there should be 1 million more independent researchers like Aella. Everyone should be able to playfully explore data and contribute to the discourse. Aella is a great model of how to conduct independent research with transparency, honesty and humility. She makes her data publicly available and cares enough to update her priors in response to new evidence (which, frankly, is more than can be said of many of her credentialed counterparts).

Myself and others are working to build tools to help lower the technical barriers and enable quality independent research like Aella’s. In doing so, I hope to pave the way to a future where independent research is not discernible from credentialed research.

That's the ultimate goal with TrovBase: to empower everyone to play with data and feel confident that they are doing it right (Sorry, yes this is kind of clickbaity for my startup but hear me out, the point is broader than just TrovBase. If you’re new here, TrovBase is a data management platform that enables anyone to collaboratively build reproducible datasets.). To this end, we have embedded best practices into TrovBase, so you don't have to worry about checking yourself against the rules - we won't let you do the wrong thing. You can configure the dataset to validate your data against specific rules you set, which ensures that your dataset is inherently clean. If any data does not match the configuration, it will not be added to the dataset, giving you added peace of mind as you explore.

In the words of Adam Savage of Mythbusters, “the difference between screwing around and science is writing it down.” This is the boring part, and, with TrovBase, we plan to abstract it away by taking care of the logging and documentation of what happens on the platform. The configuration will double as a data dictionary and we will log everything from configuration to import to extract.

I’ll make one more addition to Adam’s quote, as others have before me: the difference between screwing around and science is writing it down and reproducing it. TrovBase will also take care of that. You’ll be able to send your extracts and visualizations to collaborators and to the world via a permalink which has the underlying data, meta data, the code, and most importantly, the decisions that were made along the way to get to your output.

Once TrovBase and other tools lower technical barriers, the only barrier – and probably the hardest to overcome – are the social barriers held up by credentialism. We should overcome the instinct to only trust the people with institutional affiliation by affirming that science works best at its most collaborative. To those that fear that independent research will proliferate “bad science,” the purpose of independent research is not necessarily to validate the opinions of random redditors but rather to allow for a greater number of people to scrutinize and test theories based on available data. Our teachers told us that the scientific method, applied consistently, will weed out incorrect hypotheses as they get tested with data. If we take that scientific approach seriously, then we can get the value from independent researchers while discarding wrong ideas.

When I say “I wish more people felt empowered to playfully explore data and publish their findings,” I’m talking to myself. I have a four-year degree in economics with a minor in data analysis. Most of my experience doing data analysis was in using headache-inducing tools like Minitab; I relied heavily on friends to get me through econometrics coursework. I didn’t pursue further academic training to become a scientist; that career isn’t my comparative advantage. As a result, some people might find it suspect (even grifty) that I lead a startup aimed at making research more reproducible. I struggled deeply with this (and continue to!).

So why am I, not a scientist, building tools to improve science? Because I take slow progress personally! I want to go to Mars. I want energy that is too cheap to meter. I want myself, my friends, and my family to live long, healthy lives. It's not my comparative advantage to discover cures to cancer and other diseases, but it is my comparative advantage to build an infrastructure that enables and accelerates those discoveries.

I don’t care if that progress comes from someone in a prestigious institution. The value of scientific discoveries lies in their potential to benefit society, regardless of who makes them. Whether they come from credentialed scientists or novice independent researchers, the impact of their findings is what truly matters. That's why I'm not just building TrovBase for professional academics, but also for people without university affiliations to make their own contribution to scientific research.

Tooling for Optimism

1 Million More Aellas