Building real-time data pipelines, intelligent ETL workflows, and scalable AI-powered solutions with 3+ years of hands-on experience
- 🔭 Data Engineer specializing in Big Data, Cloud ETL, and Automation
- 🌱 Expertise in Apache Spark, Kafka, AWS, Airflow, Selenium, GenAI & RAG Systems
- 💡 Passionate about real-time streaming analytics, web automation, and intelligent data pipelines
- 🎯 Built production-grade systems for fraud detection, RAG pipelines, and ETL automation
- 📫 Reach me: vicky0x07@gmail.com
- ⚡ Fun fact: I automate everything - from data pipelines to web scraping!
Production-ready automation tool for downloading complete O'Reilly courses with automatic organization
- 🎥 Video + Transcript extraction with Selenium automation
- 🚀 Headless mode with Chrome DevTools Protocol
- 📁 Smart chapter-based organization
- ⚡ 10x faster transcript-only mode
- 🔄 Resume capability for interrupted downloads
Tech: Python, Selenium, FFmpeg, Chrome DevTools Protocol
Currently working on exciting data engineering and automation projects. Stay tuned!
class NikeshChavhan:
def __init__(self):
self.name = "Nikesh Chavhan"
self.role = "Data Engineer"
self.location = "Nagpur, India"
self.experience = "3+ years"
def get_skills(self):
return {
"big_data": ["Apache Spark", "Kafka", "Flink", "AWS Kinesis"],
"cloud_etl": ["AWS (S3, Lambda, Redshift, EMR, Glue)", "Airflow", "DBT"],
"automation": ["Selenium", "Puppeteer", "BeautifulSoup", "Scrapy"],
"genai_rag": ["LLMs (GPT, LLaMA)", "RAG Pipelines", "Prompt Engineering"],
"ml_analytics": ["XGBoost", "LightGBM", "scikit-learn", "Pandas"],
"devops": ["Docker", "Kubernetes", "Terraform", "GitHub Actions"],
"monitoring": ["Prometheus", "Grafana", "ELK Stack"],
"databases": ["Snowflake", "Redshift", "PostgreSQL", "MongoDB"]
}
def current_focus(self):
return [
"🔥 Real-time data pipelines with Spark Streaming",
"🤖 Building production-grade RAG systems",
"🌐 Web automation with Selenium/Puppeteer",
"☁️ Auto-scaling ETL pipelines on AWS",
"📊 Streaming analytics with Kafka + Redshift"
]- ✅ Real-Time Data Pipelines: Kafka + Spark Streaming for sub-second processing
- ✅ Web Automation & Scraping: Selenium, Puppeteer for intelligent data extraction
- ✅ GenAI & RAG Systems: LLM-powered pipelines with vector search + generation
- ✅ Cloud ETL: AWS (S3, Lambda, Glue, Redshift, EMR) + Airflow orchestration
- ✅ ML & Analytics: XGBoost, scikit-learn for fraud detection and predictions
- ✅ Auto-Scaling Infrastructure: Cost-optimized pipelines with Terraform + K8s
- ✅ CI/CD Automation: Docker + Kubernetes + GitHub Actions
Shivaji Science College, Nagpur | BS in Computer Science (2017-2021) | CGPA: 8/10
Certifications:
- 🏆 Data Engineering Associate (ongoing) - AWS
- 🏆 Data Engineering Professional (ongoing) - Google Cloud
- 🏆 Meta Database Engineer Professional - Coursera
- 🏆 Data Scientist Professional - Datacamp
⭐ From Nikesh Chavhan | Data Engineer & Automation Enthusiast


