pythonclirefactoring

Python CLI Monolith vs Modular: 4000 LOC, 18 Subcommands (2026)

An r/Python post asked when to split. Decision rule: coupling + navigability, not LOC. Module-per-subcommand pattern; framework choice secondary.

May 2, 2026

5 min read

An r/Python post: 4,000 LOC single-file Python CLI with argparse + 18 subcommands, stdlib + pyyaml only. Tests in a separate dir. The OP asked when to split. The honest answer: coupling, not LOC.

Why single-file works

The OP's reasons are real. One file to grep. One wheel to ship. No package layout decisions to onboard contributors through. For tools in this size range, single-file is a feature, not a smell.

What LOC alone misses

4K LOC with high coupling (subcommands share state heavily) reads differently than 4K LOC with 18 mostly-independent subcommands sharing only a few utility functions. The second case is genuinely splittable; the first probably shouldn't be.

Real signals to split

"I can't navigate it anymore" — valid on its own.
A subcommand has 500+ LOC of independent logic.
A subcommand has an independent test surface.
A subcommand has a different developer-velocity concern (different team, different release cadence).

What to ignore

"LOC threshold reached." Arbitrary.
"Modularity is best practice." Best-practice for what?
"Adding pytest test files per module." Tests are already separate.

The split shape

Module-per-subcommand is the most common shape that works. Top-level shared utils stay common; each subcommand owns its module + tests. argparse subparsers route to the right module via importlib.

Text

tool/
  __init__.py
  __main__.py     # argparse dispatcher with importlib routing
  shared/         # common utils (parse_yaml, call_api, pretty_print)
    __init__.py
    io.py
    api.py
  commands/
    __init__.py
    analyze.py    # depends on shared.io, shared.api
    migrate.py
    deploy.py
    ...
tests/
  test_shared.py
  test_analyze.py
  ...

The dispatcher

Python

import argparse, importlib

def main():
    parser = argparse.ArgumentParser()
    sub = parser.add_subparsers(dest='command', required=True)
    for name in ['analyze', 'migrate', 'deploy']:  # ...18 subcommands
        sub.add_parser(name)
    args, rest = parser.parse_known_args()
    mod = importlib.import_module(f'tool.commands.{args.command}')
    return mod.run(rest)

if __name__ == '__main__':
    main()

What stays the same

Distribution. Single wheel still ships. Onboarding still simple — a new contributor reads commands/<name>.py for the subcommand they care about. Debugging — grep across the package; modern editors handle this easily. The wins from the OP's monolithic setup don't evaporate.

The 4K LOC threshold question

At ~5K LOC with high coupling, splitting often hurts. At ~5K LOC with low coupling, splitting often helps. The OP's 4K LOC is right at the threshold; the answer depends on the coupling audit, not on hitting a magic number.

Refactor cost

A 4K LOC single-file with 18 subcommands typically splits in 3-6 hours when coupling is moderate. The work is mostly mechanical: identify shared utils, extract them, move per-subcommand logic to its module, update imports, run the existing test suite.

What about Click / Typer / Fire?

At the OP's scale, framework choice matters less than the architectural choice of when (and how) to split. Click is the natural upgrade from argparse if you want decorator-driven cleanliness. Typer is fine for type-hint-heavy codebases. Fire is for internal tools where ergonomics beats UX polish. None of these change the monolith-vs-modular question directly.

What to do this week

Run the coupling audit. List each subcommand's shared imports, shared state, and shared helpers. If most subcommands hit the same 3-5 utils and don't share state, splitting is fine and probably worthwhile. If subcommands share heavy state, keep monolithic.

Verified-online May 2026 against argparse, Click, and Typer documentation, plus the source post.