Credit Card Fraud Detection & Risk Analytics
An end-to-end machine learning project focused on identifying fraudulent credit card transactions using Python, Logistic Regression, risk scoring, and business analytics.
Transactions
284K+
Dataset Size
Fraud Rate
~0.17%
Class Imbalance
ROC-AUC
~0.97
Model Score
Fraud Recall
~57%
Detection Rate
Problem Statement
Credit card fraud is rare but financially damaging. Since fraudulent transactions represent only around 0.17% of the dataset, accuracy alone is misleading. This project focuses on recall and ROC-AUC to evaluate fraud detection more meaningfully.
Traditional accuracy metrics would show 99%+ success simply by predicting all transactions as legitimate, missing the actual fraud cases that matter most to financial institutions.
Dataset Overview
Source
Kaggle Credit Card Fraud Detection Dataset
Total Transactions
284K+
Fraud Rate
~0.17%
Target Variable
Class 0 = Legitimate, Class 1 = Fraud
Methodology
Data Loading & Cleaning
Import and preprocess raw transaction data
Exploratory Data Analysis
Analyze patterns and distributions
Feature Engineering
Create meaningful features from raw data
Logistic Regression Modeling
Train classification model
Model Evaluation
Assess performance metrics
Risk Tier Classification
Categorize transactions by risk level
Business Insight Generation
Extract actionable recommendations
Feature Engineering
Grouped transaction time into business-friendly periods for pattern analysis
Categorized transaction values into risk segments based on amount ranges
Applied log transformation to reduce skewness in transaction amount distribution
Categorized transactions into Low, Medium, High, and Critical risk levels
Model Results
ROC-AUC
~0.97
Fraud Recall
~57%
Model
Logistic Regression
Evaluation Focus
Recall + ROC-AUC
The dataset is highly imbalanced, so accuracy is not enough. Recall helps measure how many actual fraud cases are caught, while ROC-AUC shows how well the model separates fraud and legitimate transactions. A ROC-AUC of ~0.97 indicates excellent discriminative ability between classes.
Key Insights
Fraud is rare but high impact
Accuracy is misleading for imbalanced fraud data
Fraud patterns vary across time buckets and transaction ranges
Risk scoring helps prioritize suspicious transactions
High and Critical risk cases can be reviewed first by fraud teams
Interactive Dashboard
Explore fraud analytics with interactive charts and filters. Data based on the Kaggle Credit Card Fraud Detection dataset.
ROC-AUC Score
0.97
Fraud Recall
57%
Precision
82%
Fraud Rate
0.17%
Filtered Transactions
15
Avg Risk Score
0.79
Total Amount
$16,606.55
| Transaction ID | Amount | Time | Risk Score | Risk Tier |
|---|---|---|---|---|
| TXN-001 | $2,125.87 | 23:42 | 0.94 | Critical |
| TXN-002 | $1,847.32 | 02:15 | 0.91 | Critical |
| TXN-003 | $956.41 | 22:58 | 0.89 | Critical |
| TXN-004 | $1,523.90 | 01:33 | 0.87 | Critical |
| TXN-005 | $789.25 | 23:12 | 0.85 | High Risk |
| TXN-006 | $2,450.00 | 03:45 | 0.83 | High Risk |
| TXN-007 | $678.90 | 21:30 | 0.81 | High Risk |
| TXN-008 | $1,125.50 | 04:22 | 0.79 | High Risk |
| TXN-009 | $445.30 | 22:05 | 0.77 | High Risk |
| TXN-010 | $892.15 | 00:48 | 0.75 | High Risk |
| TXN-011 | $356.80 | 23:55 | 0.73 | Medium Risk |
| TXN-012 | $1,678.45 | 02:30 | 0.71 | Medium Risk |
| TXN-013 | $523.60 | 21:15 | 0.69 | Medium Risk |
| TXN-014 | $267.90 | 03:10 | 0.67 | Medium Risk |
| TXN-015 | $945.20 | 22:40 | 0.65 | Medium Risk |
Business Impact
This project supports fraud investigation teams by enabling efficient, data-driven approaches to financial risk operations. By leveraging machine learning and risk scoring, teams can make better decisions faster.
Prioritizing High-Risk Transactions
Focus resources on suspicious activity
Reducing Missed Fraud Cases
Catch more fraudulent transactions
Improving Manual Review Efficiency
Streamline investigation workflow
Data-Driven Decision Making
Enable informed risk operations
Tech Stack
About the Author
Sumukhi Pandey
Aspiring Data Analyst | AI/LLM Evaluation Intern | B.Tech CSE
I am an aspiring Data Analyst and AI/LLM Evaluation Intern with hands-on experience in data analytics, model quality evaluation, and KPI-driven reporting. I work with Python, SQL, Power BI, and Excel to analyze large-scale datasets, identify performance trends, and transform raw data into actionable business insights. I have experience evaluating 10K+ AI-generated outputs, building dashboards to track key metrics, and improving model performance through data-driven analysis. My work focuses on bridging AI model evaluation with business analytics to enhance decision-making and operational efficiency. Currently pursuing a B.Tech in Computer Science, I am particularly interested in fintech analytics, fraud detection, risk modeling, and scalable data-driven solutions.