Nearly every paper I come across is a preprint. We take it for granted now, but 33 years ago, arXiv—as we know it—emerged from the halls of Los Alamos National Laboratory, revolutionizing how we share research. I thought an exposé was long overdue, so here’s an attempt.
From Papyrus to Preprints
By Hamidah Oderinwale, with editing led by Kevin Baker
A new culture centered on the preprint and the agentic scientist
In the not-so-distant past, scientific discoveries crawled forward at a glacial pace. These days, research papers wither behind paywalls, with their insights locked away. When peer reviewers keep their gates closed, breakthroughs fade into oblivion. After Croatian virologist Beata Hallasy discovered a cure for her recurring breast cancer by self-experimenting, she found no place to share her cure with the world. Journal after journal rejected her discovery. Her story is just one example of the system falling short, but it leaves other researchers and the greater scientific community wondering how many transformative discoveries never made it past a reviewer's desk.
Hallasy's struggle revealed an ongoing tension in modern science. To maintain the rigor of the research commons, we rely on trusted institutions and reviewers to delineate what is and what is not acceptable. However, like the rest of us, they are prone to misjudgment, yet they are often treated as though their decisions are beyond question. When we fail to recognize how peer review falls short—whether in deciding which papers to publish, which projects to fund, or by moving too slowly—we risk stifling innovation and leaving untapped potential unrealized.
Preprints are already an integral part of today's research culture. Within this ecosystem, competing visions for science naturally emerge: one champions democracy through decentralization, while another laments the loss of control over selective practices that curate copious amounts of research and shower attention on the academic stars.
Preprints—publicized papers that have yet to undergo review—are a rational response to the current regime’s sluggishness. And it was the internet and a few pioneering individuals who brought the obvious into reality. They paved the way for reform and helped establish a new culture centered on the preprint and the agentic scientist.
A brief history of peer review
The practice of peer review traces back to scholarly traditions of the 18th century, including early publications like the Royal Society of Edinburgh's Medical Essays and Observations. However, the process didn't crystallize into its current form until the 1960s, when journals like Nature established systems for comprehensive research review. The fundamental purpose of modern peer review is quality control—to discern valid and important research, distinguish good science from flawed studies and, ideally, identify groundbreaking science.
Here’s how peer review works today: When scientists have findings to share, they first draft a manuscript. Once drafted, the manuscript is submitted to a journal to disseminate the findings. Based on its acceptance criteria and scope, the journal initiates a peer review process, where appointed experts evaluate the work's quality, rigor, and significance. Reviewers may recommend revisions, reject the manuscript, or accept it for publication. Finally, the work will be published if the manuscript is accepted, something which happens months or even years after review.
People can go to Cell to read about genome engineering or to the Physical Review Letters (PRL) to read about quantum sensing. For a given journal, a reader should be able to know what to expect. Assuming the names on the masthead have done their jobs, each paper has to pass someone’s bar. If they do it well, accepted work should not only be okay, it should be a quality publication—each paper sculpted to the journal’s purview. With time being a scientist’s most valuable resource, the editorial process should offer much-needed signal amidst the noise in today’s vast research corpora.
Publication peer review can also be a means to assuring equity, a way to bring lesser-known but deserving scientists into the limelight. As the journal built a community of trusted, high-profile readers, it was an asset if they could put novel work on their audience’s radar, and in return catapult the careers of those doing important, novel research. Thus, the researcher-journal relationship operates with reciprocity. Researchers gain recognition and status by being accepted into established journals. Journals grow in status by deciding the “best” researchers to feature. And the biggest dictate what’s “in” for the community at large.
Over time, journal subscribers know what content to expect, trust is slowly built, and while each paper is different, readers know they can reference a journal’s archive and find what’s relevant to them. The strength and size of these communities reflect the journal's age and prestige. For authors, citations from select journals carry weight in hiring and status; they receive more submissions, a self-reinforcing system of influence begins to take shape, and with no external force, it cycles in perpetuity.
Peer review has not changed much since the 1700s, but just because our goals haven't changed doesn’t mean the process should remain the same. We ought to then ask if it’s still serving its goals as intended and what those goals even are to begin with—what should they be?
Surely, computers and email have made life more efficient. Today, papers are sent to inboxes instead of mailboxes, and there are also many more of them being written and submitted. With insufficient capacity, peer review stays tedious and slow. It makes sense that researchers would share their manuscripts digitally before they’ve received coveted acceptance or dreaded rejection. After all, the value of a scientist’s work is in who it’s read by, and for it to be read it must be shared.
An upheaval of the archive
Physicists were among the first to adopt preprints. In seminar rooms, researchers frequently left drafts for colleagues to read. Joanne Cohn, a UC Berkeley astrophysicist, saw the potential of the tradition, so in 1989, she decided to bring it to the masses and started a mailing list to share the growing list of papers she’d accumulated. With the technology of the time, the server struggled to keep up. In 1991, Paul Ginsparg offered to automate the mailing list; the rest was history.
Ginsparg hosted the new physics preprint network at xxx.lanl.gov. And when he moved to Cornell in 2001, the physics-only repository moved with him, and the project took on the domain name we’ve come to know: arXiv.org. It went from a side project to a cornerstone of his academic career. It’s now supported by the NSF, the Simons Foundation and Ginsparg’s current employer, Cornell University. As of today, arXiv has grown into something with 5 million monthly active users, 2 million submissions, and 2.6 billion downloads—it has grown into something both substantial and influential.
Initially, arXiv was merely a tool to make paper-sharing a little easier. But as the platform has scaled, it’s evolved into something more like an alternative academic library—perhaps the grandest modern effort to democratize science. As such, I argue that its growing influence warrants deeper discussion, particularly regarding its mark on how science is shared, accessed, and critiqued.
On arXiv’s main page, you will find links to works in mathematics, physics, computer science, and other quantitative subjects. Each link directs readers to a subject-specific feed of related content. The feeds display papers chronologically, and then there’s not much else: no comments, likes, or discussions; feedback is to be shared elsewhere—off the platform. arXiv skips formal review since it's a repository, not a journal. It also holds more than papers; it's home to research projects: code, data, and papers. Evaluating research means examining everything: findings, methods, and all.
Authors can update their papers multiple times, with each update generating a new version of the paper with the same identifier. The versioning system shows that arXiv isn't just a static repository; it tracks the living nature of research. Versioning helps authors fix typos and, more importantly, integrate updates from their ongoing research. In science, we expect findings to change and experiments to evolve. But as reasonable as versioning like this seems, arXiv was basically the first to do it.
arXiv also maintains a permanent record of every article and version posted, irrespective of the license an author chooses. A chosen license is irrevocable both for the original work and any subsequent version—albeit authors do have their pick. While universities often require that their researchers transfer ownership of their work. Despite Cornell Tech’s institutional support, in this respect, arXiv functions like a journal and does not claim ownership of the work submitted by authors.
SSRN vs. arXiv
arXiv was the first preprint server, originally hosting research manuscripts in math, computer science, physics, and other technical subjects. Its success inspired a wave of discipline-specific platforms such as bioRxiv, medRxiv, PsyArXiv, and SocArXiv. While these platforms resemble arXiv in purpose and functionality, they cater to different academic communities, are governed by distinct entities, and each has its quirks. While arXiv pioneered the preprint model as a publicly accessible and non-commercial repository, SSRN—a privately owned preprint server—took a different approach. In 1994, Michael C. Jensen and Wayne Marr, who are both financial economists, launched SSRN. A few decades after its release, SSRN was acquired by the publishing giant Elsevier. At the time of the acquisition, SSRN managed to host 1.2 million articles from more than 1.7 million researchers, generating over 275 million downloads.
The acquisition stirred controversy, largely because of Elsevier's history of aggressive copyright practices. For example, Elsevier asked the University of Calgary to take down papers authored by its researchers and lobbied for more takedown authority. Elsevier’s actions led to Sci-Hub, a site providing free access to academic papers, losing its domain name. Without context, it might seem that Elsevier is just being cruel, but if you understand SSRN’s business model, it makes much more sense: their profit relies on it. With a free alternative like Sci-Hub or arXiv, a paywall would have users leaving the site at the first pop-up.
arXiv is supposed to represent a departure from the ivory tower, so it might seem wrong that Cornell, a university, owns it. But with SSRN, we see a more concerning alternative: corporate ownership with weaker institutional accountability, where self-branding as “free access” is a ruse to hide commercial motivations.
Infodemics and superconductors
arXiv promises democracy, but it has proven to be restrictive in its own ways, with opaque decision-making processes reminiscent of traditional journals. While established researchers might assume getting their work onto arXiv would be straightforward, the reality is often more complex.
Take the case of Nicolas Gisin, a respected quantum physicist from the University of Geneva, who discovered this firsthand when his mentee's work on black hole physics was mysteriously rejected from the platform. This is only one case of a mysterious arXiv rejection. Researchers seeking explanations for why their work didn’t make the cut have been met with vague or unreasonable explanations, such as a submission being “unrefereeable” or being filed under the wrong subcategory.
While arXiv's filtering is imperfect, the public and media also play a crucial role in determining what information gets broadcast. When COVID hit, there were many unknowns. To address the emergency at hand, experts had to learn as much as they could, as fast as they could. As scientists worked to understand how the virus spread, governments didn’t have the time to wait for their findings to be published in journals months down the line. Instead, they turned to preprints, which were available through platforms like arXiv, because of the unparalleled speed they offered in sharing results.
And while preprint servers made information quickly available, there was no guarantee of quality. Scientists carefully parsed through the research at their disposal, while many journalists combed through preprints for sensational stories. Tentative, early-stage findings were often exaggerated as panaceas in headlines.
These problems aren’t limited to the world of medical research. In the summer of 2023, a group of South Korean researchers reported their new discovery: a room-temperature superconductor called LK-99. LK-99 was a big deal because superconductors—in MRIs, particle accelerators, and nuclear reactors—only work at freezing temperatures. But keeping these superconductors frozen is very expensive; therefore, a room-temperature superconductor could save billions. No doubt, something like LK-99 would be transformative.
Knowing how big it would be if true, researchers raced to replicate the results. On Twitter, where most of the discourse unfolded, it wasn’t long before the world watched what seemed like a transformative instance of scientific progress crumble into dust. In emergencies like a pandemic or moments of potential breakthrough like LK-99, the first works to enter public view shape narratives. Like viruses, misinformation spreads quickly. When people grasp at any available information, the first works to surface often dominate the public conversation, sometimes with dangerous consequences.
arXiv wouldn’t have changed the outcome of LK-99, but its role in facilitating debate about the discovery shouldn’t be overlooked. “Failures” may not pass peer review, but they’re still worth discussing. Unfruitful findings can be seen as incomplete research—one insight away from a groundbreaking discovery. Lee, Kim, and their team offered hope that a room-temperature, resistance-free superconductor could someday exist and showed there might be a tractable path forward.
arXiv’s influence on science lies in its speed and scope, providing a platform that enables the dissemination of a wide range of new ideas. But not all of these ideas will have equal merit. News watchers know not to misconstrue breaking headlines as definitive pronouncements. Similarly, arXiv's role is to list the latest work, and it will be up to us to pick out which titles deserve attention. In today's dense information landscape, arXiv curates content, but the research's impact depends on how it circulates and sparks scientific discourse.
Historically, embargoes have helped to mediate the journal-scientist relationship. They coordinate the coverage of unpublished scientific papers. Even in a world where papers are submitted to journals alongside preprint servers, embargoes help the media get their facts straight, frame the right narratives, and maintain respect for researchers and their work.
This is important because arXiv shouldn’t be seen as just a passive host. In today’s climate, popularity in research is often pursued as an end in itself. Preprints and social media make it no longer necessary for a work to be picked up by an elite journal to gain recognition; a tweet with a link or a column feature is enough. We should be wary of a new arXiv culture that feeds into hype cycles and the desire for virality. While a pandemic or path-altering technology should certainly be discussed and shared, in these cases, truth matters most.
To conclude,
We stand at a juncture for academic scholarship, and it’s up to us to pick our poison. We face a choice between a more communal, anarchic scientific institution—governed by the people, for the people—and a stratified ideal that prioritizes the time of its luminaries. The central question is not just what science we pursue, but how we should organize its pursuit. Before us lies a choice between two visions: a decentralized, community-driven model where research flows freely between peers, and a curated system where established institutions continue to guide the discourse. At these crossroads, our decisions will define how we share and validate scientific knowledge in the decades to come.
Hamidah Oderinwale is an editor at Reboot. She would like to thank Ben Recht for his valuable input, Paul Ginsparg for replying to her email, and the rest of the Reboot editorial board for their helpful feedback, with special appreciation to Kevin Baker for his thoughtful revisions and editing!
Reboot publishes essays on tech, humanity, and power every week. If you want to keep up with the community, subscribe below ⚡️
🌀 microdoses
Andrej Karparthy’s arXiv sanity preserver.
An interactive map of arXiv papers.
alphaXiv to discuss arXiv papers.
Apparently, the Indian government is buying its researchers access to paid journals.
If you’re on the hunt for a new sweatshirt:
💝 closing note
This essay is the first of two. In the second, I touch on how preprint culture and ML research interact. For the first time in its nearly 40-year history, NeurIPS, the leading machine learning conference, will be handling admissions via lottery due to limited capacity. The field's rapid growth and online-first culture make it unique. Few fields have grown as quickly, and we’ll explore how this new era of idea-sharing has shaped it.
Till next time,
— Hamidah & Reboot team