Re-engineering Higher Education for the AI Economy

Why Universities Need Their Skunk Works

Feb 02, 2026

I. The Supertanker Problem

Picture a container ship at full speed in open water. Three football fields long, laden with 20,000 containers, moving at 25 knots. The captain spots an iceberg. He orders a hard turn.

The ship begins to turn. Slowly. Very slowly.

It will take three miles to complete the maneuver.

This is not a failure of the captain’s vision. It’s not reluctance to act. It’s physics. Mass resists acceleration. Momentum must be bled off gradually or structural damage occurs. The larger the vessel, the more distance it needs to change course.

Universities are supertankers.

And if you’re a provost, a dean, or a department chair, you’ve already seen the iceberg.

You open The Economist, expecting the usual hand-wringing about AI and jobs, and instead find a celebration. Data annotators now command $90 per hour. They need expertise in finance, law, or medicine. The article calls this “job creation.”

You pause. Something doesn’t add up.

Two years ago, data annotation meant tagging images for $15 an hour, maybe less if you lived in Nairobi or Dhaka. Now the same job title requires a medical degree or legal expertise. The Economist treats this as an uplifting story about new opportunities. What they buried: the ladder just lost its bottom rungs.

The floor didn’t collapse everywhere—not yet. Administrative coordinators still schedule meetings. Quality assurance testers still click through software interfaces. Junior developers still write CRUD applications. These jobs exist. But they’ve stopped being what they once were: entry points to upward mobility. You can still get hired to do procedural work. You just can’t build a career on it anymore.

That $80,000-to-$120,000 React programmer—the one who could generate competent boilerplate, follow established patterns, ship features on time—has a new competitor. The competitor costs pennies per task, never sleeps, never asks for raises, and improves every quarter. The programmer is still employed. But the premium is gone.

You’re watching the great repricing. Not job elimination. Wage compression for anything that looks like pattern-matching.

Universities see this happening. Department chairs read the same articles you do. Provosts commission reports on workforce transformation. Faculty discuss AI in curriculum committee meetings.

But seeing the iceberg and turning the ship are different problems.

II. The Inertia

A modern research university isn’t a monolith. It’s a complex adaptive system with thousands of moving parts, each with its own velocity and momentum.

There are tenured faculty hired in the 1990s who structured their careers around lecture-based teaching and publication-focused research. Their expertise is real. Their pedagogy is optimized for a world where memorization and recall had economic value.

There are accreditation bodies that evaluate programs based on learning outcomes written in 2015, before large language models existed. Those standards require measurable assessments, documented rubrics, defensible grading schemes. “Sound judgment under ambiguity” doesn’t fit the template.

There are legal departments that view any subjective assessment as litigation risk. Grade appeals, discrimination claims, disability accommodations—every evaluation that cannot be reduced to objective criteria becomes a potential lawsuit.

There are financial models built on economies of scale. Lecture halls that seat 300 students. Online programs that enroll thousands. Standardized curricula that can be delivered by adjuncts with minimal training. The entire cost structure assumes procedural teaching: one faculty member transmits knowledge to many students simultaneously.

There are students and parents who optimize for credentials, not capability. As long as a degree signals employability—even if that signal is weakening—market pressure pushes toward easier grading and higher completion rates. The university that demands more gets fewer applications in an increasingly competitive enrollment market.

There are international partnerships, visa sponsorships, regulatory compliance requirements, union contracts, endowment restrictions, donor expectations, athletic commitments, campus infrastructure, legacy systems, and institutional memory spanning decades.

None of these components are stupid. None are deliberately resisting change. But each has momentum in a particular direction, and changing course requires convincing all of them simultaneously—or accepting structural damage.

This is why Wang Laboratories no longer exists.

III. The Wang Warning

In 1988, Wang Laboratories employed 31,500 people and generated $3 billion in revenue. They were a dominant force in word processing and minicomputers. Their equipment was in offices across America.

By 1992, they filed for bankruptcy.

What happened? They saw the personal computer revolution coming. They held board meetings about it. They commissioned studies. They even developed PC products. But they couldn’t pivot fast enough. Their sales force was trained to sell minicomputers to IT departments. Their service network was built around proprietary systems. Their engineering culture centered on integrated hardware-software solutions. Their profit margins depended on hardware sales.

IBM faced the same threat and barely survived. They underwent a wrenching restructuring in the early 1990s, cut over 100,000 jobs, nearly died, and emerged as a different company focused on services and enterprise software.

Kodak saw digital photography coming. They invented the digital camera in 1975. But their business model depended on film sales and photo processing. They couldn’t cannibalize their core revenue fast enough. By the time they tried to pivot seriously, smartphones had eliminated the consumer camera market entirely.

Blockbuster saw Netflix coming. They had the capital to compete. They even launched their own streaming service. But their stores leased expensive retail space. Their revenue model depended on late fees. Their brand was built around the experience of browsing physical media. The structural commitments were too deep. By the time leadership was ready to abandon the old model completely, it was too late.

These weren’t failures of vision. Reed Hastings didn’t see something that Blockbuster executives missed. Both companies looked at the same data about broadband adoption and streaming technology. The difference was structural inertia.

Blockbuster was a supertanker trying to turn in three miles. Netflix was a speedboat that could reverse direction in seconds.

Now ask yourself: are universities Blockbuster or Netflix in this story?

IV. How Innovation Actually Happens

Here’s the pattern across industries:

Large corporations don’t innovate by redesigning their entire operation simultaneously. They innovate through parallel experiments in protected environments, then selectively adopt what works.

Xerox PARC (Palo Alto Research Center) invented the graphical user interface, ethernet networking, laser printing, and object-oriented programming in the 1970s. Xerox didn’t immediately restructure its copier business around these inventions. PARC operated semi-independently, exploring possibilities without quarterly revenue pressure. When innovations proved viable, some got integrated into Xerox products. Others—famously, the GUI—got commercialized by other companies. But PARC’s existence allowed Xerox to stay relevant in a transforming industry.

Lockheed’s Skunk Works developed the U-2 spy plane, the SR-71 Blackbird, and the F-117 stealth fighter by operating outside normal military procurement processes. Small teams, minimal oversight, rapid iteration, tolerance for failure. When projects succeeded, they got adopted into the larger defense infrastructure. When they failed, the failures didn’t threaten the parent organization.

Google’s “20% time” allowed engineers to work on pet projects outside their main responsibilities. Gmail, Google News, and AdSense emerged from this protected experimentation time. Not every project worked. Most didn’t. But the few successes generated enough value to justify the entire program.

Amazon’s “two-pizza teams” operate as independent units with minimal bureaucratic oversight. AWS started as an internal infrastructure project. Alexa began as a research experiment. The Kindle came from a team insulated from Amazon’s retail operations. When these experiments proved market viability, they scaled. When they failed, they shut down without endangering the core business.

The pattern: innovation happens at the edges of large organizations, in protected spaces with different rules, faster decision cycles, and tolerance for failure.

The successful large organization doesn’t try to make the entire supertanker agile. It launches speedboats alongside the supertanker to scout ahead, test new routes, and report back on what works.

V. The University Equivalent

Now consider higher education.

Most universities have entrepreneurship programs. They teach students about startups. They host pitch competitions. Some have venture funds that invest in alumni companies.

But how many universities operate innovation labs for their own pedagogical models?

How many have protected environments where faculty can experiment with judgment-based assessment, project-driven learning, and accountability-focused curricula—without triggering accreditation reviews, union grievances, or enrollment pressure?

How many treat educational R&D the way corporations treat product R&D—as necessary investment in future viability rather than distraction from core operations?

The answer: very few.

Most educational innovation happens through normal committee processes. A faculty member proposes a new course structure. It goes through curriculum committee review. Department approval. College approval. University approval. Accreditation review if it’s substantial enough. Implementation the following academic year if everything goes smoothly.

Timeline: 18 to 36 months. For a single course.

Meanwhile, large language models improve on quarterly release cycles. The labor market reprices skills in real time. Students graduate into an economy that shifted while their program was being designed.

You cannot run experiments on 18-month cycles when the environment changes every quarter.

VI. The Rogue Laboratory

In 2023, while universities debated whether to ban ChatGPT in classrooms, Nik Bear Brown was running a different kind of experiment.

Brown holds a PhD in Computer Science from UCLA and teaches at Northeastern University—a school that markets itself aggressively on “experiential learning.” He’d watched international students complete expensive master’s programs, graduate with credentials, enter Optional Practical Training with a visa clock ticking, and discover they had no portfolio, no documented judgment, and no clear path to employment.

The degree was valid. The learning was insufficient.

So Brown founded Humanitarians AI, a 501(c)(3) nonprofit that functions as something between a research lab, an apprenticeship program, and a compressed PhD experience. No tuition. No grades. No guaranteed timeline. Students work on real projects—peer-reviewed publications, shipped software systems, AI literacy programs for schools—and leave when they’re ready or when they get jobs.

This is the educational equivalent of a skunk works.

It operates outside institutional bureaucracy while maintaining academic rigor and legal OPT compliance. It doesn’t need to satisfy accreditation bodies. It doesn’t have tenure committees. Faculty volunteer time because they want to mentor, not because it counts toward promotion. Students self-select for judgment development, not credential collection.

The model is simple. Students contribute at different levels depending on readiness:

Research-ready contributors work on publication-grade projects. They co-author papers submitted to peer-reviewed journals—Humanitarians AI targets 10-20 publications per year. They build systems used in actual research and practice, not demos for a grade.

Applied contributors work on backend infrastructure, AI/ML implementation, product management, documentation, system integration. They ship real systems with real users. When something breaks, there are consequences.

Foundational learners—often graduates who “learned very little” in their previous programs—learn core skills properly. They prove understanding by teaching others, creating documentation and learning materials that get used by the next cohort.

All work is documented through what Brown calls the Boyle System, named for Robert Boyle’s meticulous experimental logs. Students maintain detailed weekly records: methods used, decisions made, failures encountered, iterations attempted. Every four weeks, renewal depends on documented contribution.

This isn’t punitive. It’s pedagogical. If you can’t explain what you did, why you did it, what failed, and what changed, you’re not ready for professional work. Non-renewal isn’t punishment. It’s an honest signal.

The model can tolerate failure in ways universities cannot. When a project doesn’t work, it shuts down without threatening anyone’s accreditation. When a pedagogical experiment fails, it generates learning rather than litigation risk. When a student isn’t ready, they leave without grade appeals or refund demands.

This is what innovation incubators do: provide protected space to test ideas that might not work.

VII. What Judgment Actually Looks Like

Here’s a scenario that happens regularly at Humanitarians AI:

You’re building an AI literacy curriculum for undergraduate courses. The stated requirement: teach undergrad students about AI capabilities and limitations. You start designing lessons on how large language models work, training data bias, hallucinations.

Then you talk to teachers. They don’t want technical explanations. They want: “How do I stop my students from using ChatGPT to cheat on essays?”

You talk to students. They don’t want lectures on bias. They want: “How do I use this to actually learn faster?”

You talk to parents. They don’t want either. They want: “Is this thing going to replace my kid’s future job?”

Now you have three different stakeholders with three different actual needs, none of which match the stated requirement. The procedural response: build what you were asked to build—a curriculum on AI capabilities and limitations.

The judgmental response: recognize that the problem is stakeholder misalignment, not curriculum content. The real task is facilitating a conversation about what AI literacy means for different groups, then building modular materials that serve multiple needs.

That reframing—from “build the curriculum” to “solve the misalignment”—is judgment. You can’t learn it from a textbook or a lecture. You learn it by encountering messy requirements, making decisions under ambiguity, and seeing what happens when those decisions meet reality.

This is what students at Humanitarians AI encounter constantly. Projects with fluid requirements. Stakeholders with conflicting interests. Situations where following the stated instructions would produce technically correct but ultimately useless work.

The difference between procedure and judgment isn’t complexity. It’s whether you can be held accountable for recognizing that the task itself is wrong.

VIII. The Inversion

Here’s what actually changed, and most people miss it:

A software engineer used to spend 90% of their time writing code and 10% making judgment calls about what to build, how to architect it, and whether it solved the right problem.

Now AI can generate that code in seconds.

The ratio inverted. The work is now 90% judgment and 10% procedural execution.

This isn’t about “toggling between procedure and judgment.” That understates the transformation. The procedural work still exists—you still need to write code, debug systems, implement features. But it’s no longer where you spend your time.

Consider what actually happens now when you build software professionally:

You spend three hours in meetings with stakeholders trying to understand what they actually need (not what they say they want). That’s judgment.

You spend two hours evaluating whether the feature request makes sense given the existing system architecture and future roadmap. That’s judgment.

You spend thirty minutes prompting an AI to generate the initial code implementation. That’s procedural, augmented by AI.

You spend an hour reviewing that code to verify it actually does what you intended, doesn’t introduce security vulnerabilities, and won’t create maintenance nightmares. That’s judgment.

You spend another two hours debating with your team whether this feature should ship now or wait for a more comprehensive solution. That’s judgment.

You spend fifteen minutes having AI refactor the code based on your architectural decisions. That’s procedural, augmented by AI.

You spend an hour documenting not just what the code does, but why you made the choices you made, what tradeoffs you accepted, and what future developers need to know. That’s judgment.

The code generation—the part that used to consume entire days—now takes forty-five minutes total. The judgment, stakeholder management, and accountability work takes eight hours.

The work didn’t become a mix of procedural and judgmental skills. The work became primarily judgment, with procedural execution as a minor supporting activity.

And here’s the brutal part: most education still trains students for the 90% that AI just eliminated.

AI can simulate judgment by recognizing patterns in its training data. What it cannot do is own the consequences of that judgment when requirements conflict, stakeholders disagree, or goals turn out to be incoherent.

Someone must decide which risks are acceptable, which values take precedence, which rules can be bent. Someone must take responsibility when the decision proves wrong.

That ownership—that accountability—is what resists automation. And it’s what most educational systems don’t teach.

IX. Why Universities Don’t Teach This

Walk through a typical master’s program in data science, computer science, or engineering. Students complete coursework emphasizing techniques. They learn algorithms, frameworks, tools. They complete a capstone project—usually simulated, low-stakes, over-scaffolded. Assessment rewards correctness, not reasoning. Documentation is optional. Career outcomes are assumed, not engineered.

The students graduate with valid degrees. They’ve memorized facts, applied standard formulas, executed routine tasks on tests. They’ve been trained exactly like large language models are trained: through exposure to patterns and evaluation on recall.

Then they enter the labor market and discover that everything they were certified to do can now be done by ChatGPT for $20 a month.

This isn’t malice. It’s not even incompetence. It’s structural optimization for a different environment.

Universities were designed to certify competence in an economy where competence was scarce. The lecture hall exists because you needed one expert to transmit knowledge to hundreds of students simultaneously—when knowledge couldn’t be distributed any other way. The standardized exam exists because you needed reliable signals of mastery when credentials were the only practical way to evaluate capability. The semester system exists because agricultural societies needed students home for planting and harvest.

These structures made sense for the world they were designed for.

But that world is gone. Knowledge is abundant and free. Credentials predict capability poorly. Students aren’t farmers. And the skills that made someone employable in 2015 don’t in 2025.

The institution isn’t broken because it’s doing something wrong. It’s broken because it’s still doing what it was designed to do—in an environment where those design specifications no longer match reality.

Fixing this doesn’t require better teaching of existing content. It requires admitting that the entire optimization function is wrong.

X. The Conductor’s Paradox

But you can’t just abandon procedural training and teach only judgment. That fails too.

Consider a symphony conductor. They don’t just wave their arms aesthetically. They hear each instrument, understand how sections interact, recognize when the second violins are dragging the tempo, know which compromises to make when the oboist is sick and the clarinetist must cover.

That knowledge comes from having played an instrument, studied scores, internalized musical structure through practice. You cannot conduct an orchestra you don’t understand.

The educational equivalent: you cannot orchestrate AI systems without understanding what they’re doing. You cannot evaluate whether a language model’s output is reasonable without knowing what makes code good or bad. You cannot judge whether a statistical claim is plausible without understanding statistical reasoning.

This creates what Brown calls the Conductor’s Paradox: students must still learn foundational procedural skills, but they must learn them as a means to achieve adaptive expertise, not as an end in themselves.

The traditional model fails here. It teaches procedure for two years, then judgment as an afterthought in a capstone course. By then, students have developed learned helplessness—they’ve been rewarded for following instructions, not questioning them.

The Humanitarians AI model runs procedure and judgment in parallel from day one. Foundational learners aren’t just completing tutorials. They’re teaching others—which requires understanding deeply enough to explain. Applied contributors aren’t just writing code. They’re making architectural decisions with supervision. Research contributors aren’t just running analyses. They’re formulating hypotheses and defending them.

The toggle between procedure and judgment isn’t something you learn after mastering procedure. It’s something you practice from the beginning, with stakes gradually increasing.

But you cannot test this approach within normal university structures. The accreditation body wants to see rubrics. The legal department wants objective grading criteria. The financial model needs standardized delivery. The faculty handbook specifies credit hours and contact time.

You need a skunk works where different rules apply.

XI. What the Experiment Has Produced

After two years, Humanitarians AI has generated:

Multiple peer-reviewed publications with student co-authors. Shipped AI systems used by actual organizations. Launched programs including Botspeak (AI fluency education in schools), Lyrical Literacy (using AI-generated music for literacy development), and a Fellows Program training the next generation of AI educators.

Students leave with portfolios employers can inspect: code repositories, published papers, documented decision-making under uncertainty. They leave with references from faculty who watched them handle real problems, not simulated ones. Some leave with businesses they launched during their fellowship.

This is actual experiential learning: not simulated, not sanitized, not optimized for marketing. Real work, done under real constraints, with real consequences.

More importantly, the model has proven that judgment-based, accountability-driven, project-centered learning is operationally feasible. You can document it rigorously enough to satisfy OPT compliance. You can scale it to dozens of students without collapsing under coordination costs. You can attract volunteer faculty despite no tenure benefits. You can attract students despite no credential at the end.

It works. The question is whether it can scale—and whether universities will adopt it.

XII. The Adoption Pathway

Here’s how innovation historically moves from skunk works to mainstream:

Phase 1: Proof of concept. Small external organizations or protected internal teams demonstrate the model works. They publish outcomes data. The results are compelling but localized.

Phase 2: Early adoption by risk-tolerant institutions. Forward-thinking departments at universities experiment with elements of the model. An engineering program adds intensive project-based learning. A business school implements judgment-based assessment. They face internal resistance but produce success stories.

Phase 3: Competitive pressure builds. As employment outcomes diverge—graduates from judgment-based programs command premium salaries and advance faster—universities face market pressure. Students begin choosing programs based on demonstrated outcomes rather than reputation alone. Alumni giving becomes contingent on program quality, not nostalgia.

Phase 4: Structural enablers emerge. Accreditation bodies update standards to accommodate judgment-based assessment. Professional organizations develop frameworks for evaluating programs. Legal precedent establishes that subjective evaluation of judgment is defensible when properly documented. State education departments revise regulations.

Phase 5: Integration becomes standard. What was radical becomes normal. Universities restructure around the new model. Faculty are trained in mentorship-based pedagogy. Assessment systems are rebuilt. Degree requirements shift from credit hours to demonstrated capability. Within a generation, the old model looks as antiquated as rote memorization and corporal punishment.

Medical education went through this process with problem-based learning, pioneered at McMaster University in the 1960s. It took until the 1990s for widespread adoption. Thirty years.

Engineering went through this with cooperative education, pioneered at the University of Cincinnati in 1906. It became mainstream in the 1980s. Seventy-five years.

AI is compressing timelines. The labor market is shifting not over decades but over quarters. Students graduating today with procedural skills are discovering their education is obsolete before they pay off loans.

Universities don’t have thirty years to adapt. They probably don’t have ten.

XIII. The Scalability Question

Now the uncomfortable part: can judgment-based education scale to hundreds of thousands of students?

The honest answer: probably not in its current intensive form.

Humanitarians AI functions with volunteer faculty who mentor because they want to, not because it advances their careers. That works at 20-30 students. It works at 100 with careful coordination. Does it work at 1,000? At 10,000?

But that’s not the right question. The right question is: what parts of this model can scale, and what parts must remain high-touch?

Some elements scale better than expected:

Project-based learning scales through modularity. Not every student needs one-on-one mentorship every week. They need it at decision points—when requirements conflict, when approaches aren’t working, when judgment calls must be made. The rest of the time, they work semi-independently with peer feedback.

Documentation scales through templates and systems. The Boyle System isn’t one-on-one mentorship. It’s a structured framework for self-reflection that faculty review periodically. Students learn to evaluate their own reasoning, catch their own errors, identify their own knowledge gaps. One faculty member can review documentation for 20 students more easily than providing 20 hours of individual mentorship.

Peer teaching multiplies faculty capacity. When foundational learners must teach others to prove understanding, they’re not consuming faculty time—they’re multiplying it. One expert trains ten advanced students who each train ten foundational students. The model becomes fractal.

AI tools can scale feedback loops. Ironically, the same AI systems disrupting employment can help scale judgment education. Language models can provide initial feedback on reasoning quality, flag logical fallacies, suggest alternative framings. They can’t replace expert judgment, but they can triage which work needs human review urgently versus which can wait.

The model that emerges probably looks like: large cohorts for foundational procedure, smaller groups for applied projects, intensive mentorship for research-level judgment work.

Universities already do this in PhD programs. The innovation is extending it earlier and making it operationally efficient.

XIV. The Real Test

What would prove Humanitarians AI’s model is ready for mainstream adoption?

Longitudinal employment data. Do Fellows command higher starting salaries? Do they advance faster? Do they switch jobs more successfully when labor markets shift? Do employers hire them preferentially when given a choice? Five years of data would be definitive. Three years would be strongly suggestive.

Publication outcomes. How many Fellows co-author peer-reviewed papers compared to typical master’s students? What’s the citation impact of those publications? Do they lead to PhD admissions or research positions at higher rates?

Retention and completion metrics. What percentage of Fellows complete programs versus leave early? Do the four-week renewal cycles effectively filter for judgment capability without being unnecessarily harsh? Do students who don’t renew still benefit from their time in the program?

Cost structure transparency. What does this actually cost per student when you account for volunteer faculty time, overhead, and materials? How much must be covered by philanthropy versus tuition or employer sponsorship? What’s the path to financial sustainability?

Replicability. Can faculty from other institutions implement the model successfully without Brown’s direct involvement? Is the model portable across disciplines? Can it work in healthcare education, business education, humanities?

Failure modes documented. What doesn’t work? Which students struggle? What trade-offs are being made? What problems remain unsolved?

These aren’t rhetorical questions. They require rigorous data collection over time, published transparently—including negative results.

But there’s a deeper question: are universities actually waiting for proof, or are they waiting for permission?

Because the labor market is already proving that procedural skills are commoditizing and judgment is commanding a premium. Universities don’t need more data to know this. They need permission from their own structural constraints to act on what they already know.

That permission comes from successful external experiments that prove it’s possible to restructure without institutional collapse.

XV. Why Most Experiments Will Fail

Here’s the part that doesn’t make it into the success stories: most educational innovation experiments fail.

They fail because they’re underfunded. They fail because key faculty leave for better opportunities. They fail because they can’t attract enough students. They fail because they solve the wrong problem. They fail because what works at 10 students breaks at 100. They fail because the data doesn’t support the hypothesis.

Humanitarians AI might fail. The model might not scale. The employment outcomes might prove mediocre. The documentation burden might become unsustainable. The volunteer faculty model might collapse. The legal complexity of OPT sponsorship might prove unworkable at larger scale.

This is how innovation works. You don’t know which experiments will succeed until you run them. Venture capitalists expect 90% of startups to fail. Pharmaceutical companies expect most drug candidates to fail clinical trials. Venture capital firms invest in portfolios because individual bets are unreliable.

Universities should approach educational innovation the same way. Not betting everything on one model, but running multiple experiments simultaneously and adopting what works.

The university that only experiments within its core operations—the one that treats every pedagogical change as a high-stakes restructuring of degree requirements—is like the Fortune 500 company that refuses to fund R&D because it might not produce immediate revenue.

You cannot innovate by committee and consensus when the environment is shifting faster than committees can meet.

XVI. What Universities Can Learn From Corporations

IBM nearly died in the early 1990s. They saw personal computers coming. They built PC products. But their sales force was trained to sell mainframes. Their service network was built around enterprise installations. Their engineering culture centered on integrated systems. Their profit margins depended on hardware.

They underwent a wrenching restructuring. Cut 100,000+ jobs. Nearly went bankrupt. Emerged as a different company focused on services and enterprise software.

But they survived because they were willing to cannibalize their core business before someone else did.

Microsoft faced obsolescence when cloud computing and mobile devices threatened Windows and Office. They restructured around Azure and cloud services. They open-sourced significant technology. They acquired companies that operated differently than their core business. They succeeded because Satya Nadella was willing to say: “Our identity is not our products. Our identity is enabling productivity.”

Intel is currently struggling with this transition. They dominated PC processors for decades. Now computing is moving to mobile, cloud, and AI accelerators. Their manufacturing process was their advantage. Now TSMC can manufacture better. Their x86 architecture was the standard. Now ARM is winning mobile. They see the shifts happening. Whether they can turn the ship fast enough remains unclear.

The pattern: survival requires being willing to obsolete yourself before the market does.

Universities face the same choice.

The university that says “Our identity is delivering degree credentials” will become obsolete as degrees predict capability more poorly.

The university that says “Our identity is developing human judgment and capability” has a chance—if it’s willing to restructure around that identity even when it means abandoning lecture halls, dismantling credit-hour systems, and accepting that some traditional faculty roles become obsolete.

That restructuring is existentially threatening to individuals within the institution. Which is precisely why it cannot happen through normal institutional processes. It requires protected experiments that demonstrate new models without immediately threatening existing jobs and structures.

You launch the speedboats before you abandon the supertanker.

XVII. The Innovation Infrastructure

What would it look like for universities to systematically run educational experiments?

Some universities are starting to build this infrastructure:

Arizona State University’s learning engineering initiatives run controlled experiments on pedagogical approaches, measure outcomes rigorously, and publish results. They’ve tested hundreds of course designs and identified which elements actually improve learning.

Georgia Tech’s online master’s in computer science experimented with massive scale at low cost. They proved you can deliver graduate education for $7,000 total tuition while maintaining quality comparable to on-campus programs. Now dozens of universities are replicating the model.

Minerva University built an entirely new institutional model from scratch, emphasizing active learning, global mobility, and habit-of-mind development rather than content coverage. Whether it scales beyond boutique enrollment remains to be seen, but they’re generating valuable data on alternative structures.

Southern New Hampshire University’s College for America offers competency-based degrees where students advance by demonstrating mastery rather than accumulating credit hours. Employers sponsor employees to complete degrees while working. This tests whether time-based credentials are necessary.

These experiments exist. But they’re scattered, underfunded relative to core operations, and often viewed as peripheral to the “real” university.

Imagine if universities treated educational R&D the way pharmaceutical companies treat drug development: 15-20% of budget dedicated to testing new approaches, rigorous outcome measurement, rapid iteration, publication of results including failures, systematic adoption of what works.

What would that look like operationally?

Dedicated innovation units with different rules than regular academic departments. Protection from accreditation reviews during experimentation. Ability to hire faculty on different contracts. Freedom to test radical structural changes.

Substantial funding for multi-year experiments. Not grant-dependent pilot programs that disappear when funding runs out, but sustained investment in testing alternatives.

Rigorous outcome measurement with longitudinal employment tracking, skill assessment, and comparative analysis against traditional programs. Data collection built into program design, not added as an afterthought.

Rapid iteration cycles. Programs that can restructure every semester based on emerging data, not every five years based on accreditation schedules.

Transparent publication of results—including failures. The innovation unit that never publishes a failed experiment is lying about its failure rate.

Systematic adoption pathways for successful experiments. Clear processes for how innovations move from protected labs into mainstream programs when evidence supports it.

This infrastructure doesn’t exist at most institutions. Which is why most educational innovation happens in external organizations like Humanitarians AI, or in corporations building their own training programs because universities can’t meet their needs fast enough.

XVIII. The Corporate Alternative

Here’s the scenario universities should fear: corporations stop waiting.

Google built Google University internally. Amazon built Amazon Technical Academy. Apple runs Apple University. These aren’t traditional educational institutions. They’re corporate training programs that teach exactly what the companies need, structured around judgment development rather than credential certification, with immediate feedback loops between learning and application.

When Microsoft needs AI engineers, they don’t just hire from universities. They run internal training programs that take employees with adjacent skills and upskill them in months, not years. The curriculum focuses on what actually matters for production systems. Assessment is binary: can you ship working code that solves real problems?

When healthcare systems need nurses with specialized skills, they build their own residency programs rather than waiting for nursing schools to update curricula.

When consulting firms need strategic thinkers, they run intensive training programs for new hires because MBA programs don’t reliably produce what they need.

The trend is clear: employers with resources are building their own educational infrastructure because universities aren’t adapting fast enough.

This should terrify higher education. You’re being disintermediated by your own customers.

The counter-argument: corporations optimize for narrow skills, not broad education. True. But students increasingly don’t care about broad education if it doesn’t lead to employment. And employers increasingly don’t care about broad education if it doesn’t predict capability.

The university that says “We provide holistic development and critical thinking” sounds wise until students respond “Okay, but can you also help me get a job?”

XIX. What Students Actually Need

Strip away the institutional complexity and ask: what do students need to succeed in the AI economy?

They need to learn how to toggle between procedural execution and judgmental reasoning. Not procedural skills alone. Not judgment alone. The rapid oscillation between both under time pressure.

They need to practice this in environments with real consequences. Not simulations. Not case studies. Real projects where their decisions matter, where failure teaches lessons, where success builds genuine confidence.

They need documentation of their judgment, not just credentials. Portfolios that employers can inspect. Published work that demonstrates capability. References from people who watched them handle ambiguous situations.

They need to develop domain expertise in something specific. Not generalized “critical thinking” detached from application, but judgment grounded in understanding how a particular domain actually works—whether that’s software engineering, healthcare, policy analysis, or business operations.

They need to learn faster than four years allows. The labor market shifts every quarter. The degree program designed in 2020 graduates students into 2024. The lag is unsustainable.

Can traditional universities provide this?

Some parts, yes. The domain expertise. The credential (for now). The broad education that provides context.

But the judgment development, the real-consequences learning, the rapid iteration—these work better in structures designed for them.

Which is why universities need their skunk works.

XX. The Path Forward

Here’s what should happen:

Universities should fund external innovation labs that operate outside normal bureaucratic constraints. Not as threats to replace the institution, but as R&D arms that test what the institution might eventually adopt.

They should run multiple experiments simultaneously. Not betting on one model, but funding diverse approaches and measuring outcomes rigorously. Most will fail. Some will succeed. The successes inform mainstream restructuring.

They should create adoption pathways for proven innovations. When an external experiment demonstrates superior outcomes over three years, what’s the process for integrating those practices into regular programs? Who decides? How fast can it happen?

They should acknowledge that their current optimization function is obsolete. The labor market has shifted. Continuing to optimize for credential efficiency and procedural certification is like Blockbuster optimizing late-fee collection in 2008.

They should accept that meaningful restructuring is existentially threatening to individuals within the institution—and plan for that. Faculty whose expertise is lecturing will need retraining or retirement. Administrators who specialize in credit-hour management will need new roles. Students who expected easy degrees will resist harder standards.

None of this is comfortable. All of it is necessary.

The alternative is obsolescence. Not immediate. Not total. But slow erosion of relevance as employers bypass universities, students choose faster pathways, and credentials predict capability more poorly each year.

Wang Laboratories had five years from market leader to bankruptcy. Blockbuster had seven years from peak to irrelevance. Kodak lingered longer but lost 90% of their workforce.

How long does higher education have?

XXI. Why Humanitarians AI Matters

Humanitarians AI is not trying to replace universities. It’s not scaling to enroll millions of students. It’s not competing for traditional degree-seekers.

It’s proving that an alternative model works. That judgment-based, accountability-driven, real-consequences learning produces graduates employers actually want. That you can document judgment rigorously enough to satisfy legal requirements. That you can run this operationally without infinite resources.

It’s serving as the speedboat alongside the supertanker, testing routes the supertanker might eventually follow.

When a university provost says “We can’t change our assessment system because accreditation won’t allow subjective evaluation,” Humanitarians AI provides an existence proof: here’s how you structure subjective evaluation to satisfy legal requirements.

When a department chair says “Students won’t accept judgment-based grading where effort doesn’t guarantee high marks,” Humanitarians AI shows: students self-select into programs that demand judgment when the value proposition is clear.

When a CFO says “Intensive mentorship is too expensive to scale,” Humanitarians AI demonstrates: here are the parts that scale and the parts that don’t, and here’s the cost structure.

When a faculty senate says “This would be career suicide for untenured faculty,” Humanitarians AI proves: volunteer faculty will contribute when the work is meaningful, even without career benefits.

Every objection to educational restructuring gets tested empirically. Some objections prove valid. Some prove surmountable. The data distinguishes between real constraints and institutional learned helplessness.

This is what innovation labs do. They don’t prove that change is easy. They prove that change is possible—and they map the path.

XXII. What You Should Do

If you’re a student: stop optimizing for grades. Start building portfolios of work where you made actual decisions under ambiguity. Seek environments where you can encounter messy problems, make calls, and see consequences. Document your reasoning. You need evidence of judgment, not just credentials.

If you’re faculty: push for protected spaces to experiment with judgment-based learning. Even within existing bureaucracy, you can run modified projects in your courses. Document what works. Publish results. Build the evidence base that forces institutional change.

If you’re an administrator: fund innovation labs. Give them different rules than regular programs. Protect them from the normal bureaucratic requirements that prevent experimentation. Measure outcomes rigorously. When something works, create pathways to mainstream adoption.

If you’re an employer: evaluate portfolios, not just credentials. Test judgment through scenario-based interviews. Hire from programs that produce documented capability. Your hiring decisions create market pressure that forces educational change.

If you’re a policymaker: update accreditation standards to accommodate judgment-based assessment. Fund rigorous evaluation of educational experiments. Protect institutions that innovate from frivolous litigation. Remove regulatory barriers to structural change.

And if you’re building educational experiments: document everything. Publish outcomes transparently, including failures. Make your model easy to copy. The goal isn’t to build an empire. The goal is to prove restructuring is possible—and to force the larger system to evolve.

XXIII. The Urgency

AI language models achieved human-level performance on the SAT, the bar exam, and medical licensing exams by training on the same materials students train on. What this reveals: standardized tests measure exactly the cognitive skills that AI excels at—pattern recognition, recall, and application of learned procedures.

Universities have spent decades optimizing for success on these measures. They’ve designed curricula around them, hired faculty who excel at teaching them, built assessment systems that reliably measure them.

Those skills now have near-zero marginal value in the labor market.

This isn’t a slow transition universities can adapt to gradually. It’s a phase change happening in quarters, not decades.

The class that graduated in 2022 competed for jobs mostly against other humans. The class graduating in 2025 competes against AI systems in every role involving pattern-matching. The class entering in 2025 will graduate in 2029 into a labor market that may look nothing like the one their program was designed for.

Universities see this happening. But seeing the iceberg and turning the ship are different problems.

You cannot restructure a 40,000-student research university in three years through normal institutional processes. The bureaucracy is too deep. The structural commitments too extensive. The stakeholder conflicts too intense.

But you can fund external experiments that prove alternative models work. You can run protected labs that test new approaches. You can create pathways to adopt successful innovations into mainstream operations.

You can launch speedboats alongside the supertanker.

That’s what Humanitarians AI represents: not a replacement for universities, but proof that restructuring is possible—and evidence for what that restructuring might look like.

The employment floor didn’t collapse everywhere. But the ceiling moved. The premium is no longer on procedural competence. It’s on judgment under ambiguity. On accountability for decisions. On the ability to navigate uncertainty and own the consequences of your calls.

Some universities will adapt. They’ll fund innovation labs, adopt proven models, restructure around judgment development. They’ll accept temporary chaos and structural conflict in exchange for long-term relevance.

Others will optimize for institutional stability, prioritize existing stakeholders over future students, and believe that reputation and inertia will sustain them indefinitely.

History suggests which strategy succeeds.

The supertanker is turning. Very slowly. Whether it completes the maneuver before the iceberg depends on decisions made now—not by visionaries, but by administrators willing to fund uncomfortable experiments whose results might threaten their own institutions.

The hard part isn’t knowing what to do. The hard part is accepting that doing it requires launching speedboats you don’t fully control, testing models that might fail spectacularly, and admitting that your current optimization function is obsolete.

Universities need their skunk works. Not someday. Now.

The question isn’t whether to innovate. The question is whether you’ll do it fast enough to matter.

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?