Michael(Mingwei) Sun

Data Analyst, Data Scientist

Connect

Feel free to contact me at smwkiwi@gmail.com

PROFESSIONAL PROFILE

➢ Reliable and diligent professional offering solid skills in complex analysis and mapping business goals, objectives and needs. Adept in applying machine learning algorithms to derive meaningful insights and definitive results, explaining reports and delivering presentations in an easy-to-understand way. Strong track record of deploying Business Intelligence Solutions, applying industry best practices, and using the latest technology in data and analytics. Participated in academic projects for the Ministry of Education and University of Auckland as part of Master's studies.

➢ Through advanced-level studies and internships, gained a deep understanding of the importance of getting to know datasets very well to build quality and accurate products. Demonstrates high professionalism in data processing, actively seeks to resolve problems and is committed to ongoing learning. Able to work in complex environments with enterprise applications. Proactive, willing to go the extra mile, a clear communicator and a collaborative team player.

Advanced Knowledge Base

Data Mining & Modelling
Performance Evaluation
Data Visualisation & Mining
Data Science Research Methods
Machine Learning Algorithms
Integrating Data Sources
SQL Server Administration
Big Data Technologies
Data Transformation & Cleaning
Data Insights & Results
Manipulating Complex Datasets
Generating & Analysing Reports

TECHNOLOGY & SOFTWARE LANGUAGES

Programming Languages

  • Python (data extraction, preprocessing, big data processing, decision tree modelling, hyperparameter tuning, automation in Databricks)
  • R & R Studio (acquiring and manipulating large and complex datasets, applying data mining techniques, analysing data using modern regression methods)
  • SQL (querying and manipulating large datasets, data extraction from Snowflake)

Data Engineering

  • Databricks (data preparation, exploratory data analysis, machine learning model development, automation)
  • Microsoft Azure (Azure Data Factory – orchestrating data pipelines for data preparation and model validation)
  • Snowflake (data storage, querying, and transformation)

ML & Statistical Analysis

  • Machine Learning Modelling (cross & model validation, hyperparameter tuning)
  • Exploratory Data Analysis (correlation analysis, data visualisation, statistical insights)

Data Visualisation

  • Power BI (data transformation, cleaning & modelling, generating & analysing reports)

Documentation & Collaboration

  • Confluence (technical documentation, research process documentation)
  • GitHub, Git (version control, collaboration)
  • Asana (project management, task tracking)

MS Office Suite

  • Excel (Statistics & Data Analysis)
  • Word
  • PowerPoint
  • Outlook
  • Teams

PROFESSIONAL EXPERIENCE

Intern Data and Analytics

Danone Nutricia New Zealand Ltd (NZN)

Nov 2024 - Feb 2025

DatabricksSnowflakeMicrosoft AzureGitHubSQLAsana
Responsibilities & Achievements
  • Shadowed senior data engineers on data transformation and pipeline development, gaining expertise in Databricks, Snowflake DWH DMT DSP, and ADF integration. Self-created ADF ING and ANL pipelines for data transformation.
  • Led a research initiative to optimise process specifications (98 parameters) for milk powder production, improving consistency in physical powder properties (PPPs) at the Balclutha base powders factory. Agile methodology for managing the project progresses through Asana with collaborators.
  • Extracted and processed large datasets from Snowflake using Databricks, ensuring efficient data preparation, cleaning, correlation analysis, and data splitting.
  • Implemented data pipeline automation using Azure Data Factory (ADF), saving the split train-test-validation datasets into ADF-produced folders as parquet files for seamless model training and validation.
  • Conducted exploratory data analysis (EDA) for each PPP, focusing on foam height, bulk density, and flecks. Identifying key trends and optimising input features for modelling.
  • Designed and trained decision tree models with cross-validation and hyperparameter tuning, selecting the best performing model based on accuracy and robustness. The optimised decision tree model was saved in the Databricks models folder, ensuring reproducibility and scalability for future applications.
  • Created a dedicated validation notebook for testing the model on an untouched validation dataset, ensuring generalisation and performance accuracy before deployment.
  • Presented research findings and decision tree model results to leading team stakeholders, ensuring the whole project work was fully understood by them.
Results Delivered
  • Automated data extraction, preparation, and storage using Databricks and ADF, enhancing workflow efficiency and eliminating manual data handling.
  • Created comprehensive Confluence documentation detailing data preprocessing, feature selection, model training, validation results, and key findings for stakeholder reference.
  • Standardised model validation practices, ensuring robust performance assessment prior to deployment.

Website Administrator (Online marketing & Sales)

NATORG International Ltd

Nov 2019 - Present

ERPMS ExcelPower BICRM
Responsibilities & Achievements
  • Reporting: Analysed customer-purchased data to glean insights relevant to future marketing plans and prepare detailed weekly sales and revenue reports.
  • Administration Support: Ensured the website data was up to date, properly functioning, and visually appealing – monitored website performance, conducted audits, and ensured optimal speed, responsiveness and user experience.
  • Knowledge management: Managed two employees, offering guidance on marketing and sales issues. Collaborated with designers and vendors to create and implement weekly targeted promotional plans and content.
  • Support Customers: Primary contact point for WeChat Mall customers, quickly respond to all inquiries, investigate and resolve any complaints, and ensure all customers receive top-tier service in all interactions.
  • CRM System Administration: Updated and maintained the CRM system, managed the online customer database stock levels, and carried out a comprehensive analysis of the systems to ensure optimal functionality.
  • Office Management: Oversight and management of daily operational activities for WeChat Mall, including updating products and promotional banners, designing and publishing advertisements, and coordinating with suppliers and the warehouse team about goods details.
Results Delivered
  • Proactively helped the Manager use Power BI to process monthly sales data and generate a visual dashboard. This saves hours of work as it previously required manual processing using Excel.
  • Increased online sales revenue by 20% yearly and expanded the customer database by 10% through effective prospecting and outstanding service.

ACADEMIC PROJECTS

Data Science Dissertation (A+) - Software Application Usage in New Zealand's Education Sector: Analysing Seasonal Trends and Global Applicability

Ministry of Education, New Zealand

Power BIRR StudioGraylog
Project Details
  • Collaborated with the Engagement and External Communications team at MOE to analyse application log data and user report data from educational institutions
  • Identified user usage patterns and seasonal trends
  • Merged 17 independent applications (10G) and user report datasets (large, complex datasets) and cleaned the combined data using R studio and Power BI
  • Created interactive dashboards and multi-dimensional visualisation charts, highlighting various educational applications' seasonal trends and usage patterns
  • Organised weekly meetings to report on the project's progress, identified and raised problems, and discussed solutions with the mentor and project leader
Results & Achievements
  • The project's findings enabled better resource allocation and server management, leading to improved operational efficiency of educational systems
  • Provided a reference model for other regions considering digital education systems

Big Data & Data Mining Project – Europe Hotel Satisfaction Score Survey

University of Auckland

SPSS ModellerScikit-LearnPythonAWSPySparkGit
Project Details
  • Developed and implemented data mining algorithms for large-scale datasets
  • Applied machine learning techniques to extract meaningful insights
  • Created data visualization dashboards for complex data analysis
  • Collaborated with research team on methodology and implementation
Results & Achievements
  • Successfully processed and analyzed large-scale hotel satisfaction datasets
  • Developed innovative data mining approaches for hospitality data
  • Created comprehensive visualization tools for data interpretation
PandasGloVePyTorchMatplotlibSeaborn
Project Details
  • Worked with a dataset of 982,619 reviews, extracting and filtering review texts and using a subset of 200,000 samples for training due to computational constraints
  • Developed a deep learning model to predict Amazon Kindle product ratings based on user reviews, involving data preprocessing, feature extraction with GloVe embeddings, and LSTM model development using PyTorch
  • Evaluated the model using precision, recall, F1-score, and accuracy metrics; achieved an accuracy below 50%, highlighting the potential for improvement with more training data and epochs
  • Successfully implemented the model and identified potential improvements
Results & Achievements
  • Identified potential improvements for future iterations
  • Future improvements could include utilising more computational resources and advanced models such as Transformers for better performance

QUALIFICATIONS

Master of Data Science

The University of Auckland

Auckland, New Zealand

2022 - 2024

  • Data Science Dissertation (A+)
  • Data Mining & Big Data (A+)
  • Topics in Official Statistics (A)

Bachelor of Science (Computer Science & Statistics)

The University of Auckland

Auckland, New Zealand

2016 - 2018

TECHNICAL CERTIFICATIONS

Databricks Fundamentals

Databricks

2025

Azure Data Fundamentals

Microsoft

2022

Advanced Google Analytics

Google Analytics

2022

Power BI

Microsoft

2020

REFERENCES