webmotivated

  • AI Receptionist
  • Contact
  • Success Stories
  • About
  • Home

Cloud Data Migration (2)

Goal

Future-proof public sector client’s data management processes (collection, sharing, use, storage and publication). Consolidate data assets from multiple source systems for analysts to gain better insights and enrich their analyses.

Role

  • Developed and optimized AWS-based ETL processes: Designed and implemented 5 AWS Lambda functions and multiple AWS Glue ETL jobs, enhancing data ingestion and processing efficiency. This resulted in a 30% reduction in data pipeline execution time by automating data mapping, data asset initialization, and data preparation workflows.
  • Implemented robust data sanitization and compliance measures: Enhanced data quality and regulatory compliance by debugging and extending AWS Glue ETL jobs to perform comprehensive data sanitization, personally identifiable information (PII) identification and redaction. This ensured all data handling adhered to GDPR standards, securing sensitive information across all datasets.
  • Streamlined data migration and documentation: Authored high-level and low-level design documents and testing plans to ensure clear, maintainable code and reliable data migration processes. Developed 25 custom scripts for efficient data migration into AWS PostgreSQL, facilitating the seamless transition of critical data assets with minimal downtime.

Capability

Data-driven decision-making and reporting enabled via custom built:

  1. Data Integration: Combined data from various sources into a unified data warehouse.
  2. Data Transformation: Standardised and sanitised data for consistency and accuracy.
  3. Data Accessibility: Provided a user-friendly interface for accessing and querying data.
  4. Reporting and Analytics: Enabled comprehensive reporting and analytics capabilities.

Tech Stack

  • Python – coding language for data pipelines
  • SQL – DQL, DML, DDL
  • AWS Cloud Services
    • Aurora RDS – To host a data warehouse
    • Glue – for data pipelines
    • Lambda – serverless compute platform
    • Step Function – workflow orchestrator
    • S3 – cloud storage
    • Comprehend – Machine Learning powered, continuoulsy trained Natural Language Processing (NLP) service. Used to identify and redact the personally identifiable information (PII).
    • CodeCommit – for repository management
    • CloudShell – browser-based CLI environment
  • Apache Airflow – for orchestration of data pipelines using DAGs
  • Docker Engine / Desktop – to create containers for airflow environment and postgresql instance
  • DevOps – GitHub Actions, Terraform
  • Git – for repository management
  • Pycharm – IDE
  • Atlassian Toolset – Jira / Confluence
  • Sharepoint – content collaboration and workspace
  • Mural – content collaboration / design
  • Microsoft Visio – solution design

Year

2024

Filed Under: Data Pipelines

CONTACT

  • 24/7 Support: Speak with our voice AI agent Lisa, +44 7 53337 2290.
  • Discovery Call (20 mins): Book a no-obligation chat to discuss AI automation.
  • Email us: We respond within 48 hours.

GO TO

AI Receptionist Success Stories About Home

SOCIALIZE

  • LinkedIn
  • YouTube

© Copyright 2020 · WEBMOTIVATED · All Rights Reserved

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the revoke consent button.