An AI-powered enterprise data quality platform built with Python and Streamlit to automate data profiling, cleaning, schema inference, anomaly detection, and reporting workflows.
Cleanlytics AI is an AI-powered enterprise data quality platform designed to reduce manual data cleaning effort. The platform allows users to upload datasets, analyze data quality issues, detect missing values and outliers, infer correct data types, apply cleaning actions, generate reports, and export cleaned datasets through an interactive dashboard interface.
Real-world datasets are often messy, incomplete, inconsistent, and difficult to prepare manually. Data analysts spend significant time identifying missing values, incorrect data types, duplicate records, outliers, and formatting issues before analysis or machine learning can begin.
I built Cleanlytics AI to automate major parts of the data preparation workflow. The platform performs intelligent data profiling, recommends optimal data types using schema inference logic, detects anomalies using statistical and machine learning methods, supports automated cleaning actions, maintains audit logs, and generates downloadable reports.
Automatically analyzes dataset structure, missing values, column types, unique values, and data quality issues.
Recommends suitable data types such as integer, float, datetime, category, and object based on value patterns and conversion ratios.
Detects abnormal values using IQR, Z-Score, and Isolation Forest with options to fix or handle detected anomalies.
Generates professional PDF reports and exports cleaned datasets for future analysis.
Used as the core programming language for data processing, cleaning logic, and backend workflows.
Used to build the interactive multi-page dashboard and responsive UI.
Used for dataset manipulation, missing value handling, profiling, and numerical operations.
Used for machine learning-based anomaly detection with Isolation Forest.
Used for interactive visualizations and dashboard charts.
Used for automated PDF report generation.
Solution Applied:
Built automated profiling and cleaning workflows to detect and handle common data quality issues including missing values, incorrect formats, wrong data types, duplicate records, and inconsistent values.
Solution Applied:
Created schema inference logic using numeric conversion ratio, datetime parsing success, uniqueness ratio, and pattern-based checks.
Solution Applied:
Implemented IQR, Z-Score, and Isolation Forest so users can compare statistical and ML-based detection methods.
Solution Applied:
Designed a modern Streamlit dashboard with clean layout, modular navigation, action buttons, cards, and report downloads.
I'd love to discuss the technical details, methodology, and learnings from this project. Feel free to reach out to learn more!