Hello there, really interesting idea for the model you have, let's do one thing, let me give you a starting point in the form of a skeleton approach here, have a look, and if you think it's a good enough starting point, hit me up. I've previously built models to recognize vehicles and their number plates, so I understand how this can be done.
Technical Approach:
Data Collection: Assemble a diversified dataset of images and videos across various lighting and weather conditions, including congested traffic scenarios.
Preprocessing: Implement image normalization, augmentation (rotation, scaling), and region-of-interest (ROI) extraction to enhance model training robustness.
Model Architecture: Utilize a dual-stage approach where the first stage detects vehicles (YOLOv5/SSD) and the second stage classifies the detected vehicles using a fine-tuned CNN (ResNet-50/VGG-16) for specific vehicle attributes.
Training: Apply transfer learning to leverage pre-trained weights on ImageNet, followed by supervised learning on the custom dataset. Use data augmentation to improve generalization.
Optimization: Integrate batch normalization and dropout techniques to prevent overfitting. Employ quantization and pruning for model compression, ensuring efficiency.
Evaluation: Perform cross-validation using IoU (Intersection over Union) and mAP (mean Average Precision) metrics for detection accuracy, and a confusion matrix for classification tasks.