Lư Xuân Dương

Data Science Student & Aspiring Data Engineer

Building scalable data pipelines and robust infrastructure to transform raw data into reliable, analysis-ready assets for data-driven insights.

Tech Stack

Python • Spark • Kafka

Passion

Data Engineering

Lư Xuân Dương

About Me

Hello! I'm Xuan Duong, a final-year Data Science student at the University of Science, VNU-HCMC. My passion for technology is rooted in system programming and logical problem-solving, which has guided my focus toward the field of Data Engineering.

My ambition is to become a Data Engineer, specializing in architecting and automating scalable data pipelines. I am currently focused on mastering core technologies like Python, SQL, Apache Spark, and Docker to build robust data infrastructure. I am eager to apply an engineering mindset to solve complex data challenges and build the reliable foundations that empower organizations to make data-driven decisions.

HCMUS Logo

Education

University of Science, VNU-HCMC

Data Science Major

Expected Graduation

11/2026

GPA

3.48/4.00

Location

Ho Chi Minh City, Vietnam

Work Experience

Sacombank Logo

Database Administrator (DBA)

Sacombank (Sai Gon Thuong Tin Commercial Joint Stock Bank)

01/2026 – 03/2026 (3 months)

01/2026 – 03/2026

Engineered and optimized high-performance ELT pipelines to migrate over 1,000,000 records daily between Oracle-to-Oracle environments using Oracle Data Integrator (ODI), ensuring seamless data synchronization and minimal downtime.

Leveraged Check Knowledge Modules (CKM) to implement data quality firewalls, automatically identifying and filtering out 2% anomalous data to ensure data warehouse integrity and maintain high data quality standards across critical financial systems.

Oracle Database ODI (Data Integrator) Data Quality ETL/ELT Performance Optimization Data Pipeline

My Projects

Real-Time CDC Pipeline

Python Spark Kafka Debezium MySQL MongoDB

Built a real-time CDC pipeline to sync data from MySQL to MongoDB using Debezium and Kafka. Optimized for large datasets with parallel processing.

Feb - March 2026 Latest
View on GitHub

CGV ETL Data Pipeline

Apache Airflow Apache Spark Docker ETL Pipeline Selenium

Engineered a fully automated ETL pipeline to collect, process, and visualize movie schedules from 80 CGV cinemas nationwide.

Oct 2025 Personal
View on GitHub

House Price Prediction

Data Science LLMs Feature ENG EDA

Developed a predictive model to estimate house prices based on collected features, conducting comprehensive EDA and visualization.

May - June 2024 Team Lead
View on GitHub

Get In Touch

Let's connect and create something amazing together

Location

Thu Duc, Ho Chi Minh City, Vietnam