AI / ML ML Pipeline stable

Kedro

Framework for production-quality, reproducible data science code

10.0K stars 250 contributors Since 2019
Website → GitHub

ML development framework creating reproducible, maintainable pipelines with data catalog, standardized project structure, and visualization tools.

License
Apache-2.0
Min RAM
512 MB
Min CPUs
1 core
Scaling
single_node
Complexity
intermediate
Performance
medium
Self-hostable
K8s native
Offline
Pricing
fully free
Docs quality
excellent
Vendor lock-in
none

Use cases

  • Standardize ML project structure across teams
  • Build reproducible data transformation pipelines
  • Visualize data dependencies with Kedro-Viz
  • Transition from notebooks to production code

Anti-patterns / when NOT to use

  • Opinionated project structure may not fit all teams
  • Learning curve for catalog system
  • Less suited for real-time or streaming
  • Smaller community than Airflow/MLflow

Replaces / alternatives to

  • Custom data pipeline scripts

Technical specs

Language
Python
API type
SDK
Protocols
HTTP
Deployment
pip
SDKs
python

Community

GitHub stars 10.0K
Contributors 250
Commit frequency weekly
Plugin ecosystem none
Backing McKinsey QuantumBlack
Funding corporate

Release

Latest version
Last release
Since 2019

Best fit

Team size
solosmallmedium
Industries
generalconsultingfintechresearch

Tags

  • data-engineering
  • reproducible
  • pipelines
  • data-catalog
  • modular-code
  • visualization