❄️ Snowflake
⚡ Apache Spark
☁️ BigQuery
Pipeline · running
DAG · success
🧱 Databricks
Data Engineer · Mexico

Building the
Data Infrastructure
of tomorrow

Hi, I'm Carlos Pulido Rosas. I design and build scalable data pipelines, multicloud architectures, and ML-powered systems that turn raw data into decisions.

Tech Stack
🐍Python
🗄️SQL
❄️Snowflake
☁️BigQuery
🧱Databricks
Apache Spark
🌬️Apache Airflow
📦Docker
🏗️Terraform
🔷Azure
🟠AWS
🔵GCP
🐘PostgreSQL
🔀Apache Kafka
🤖PySpark ML
📊dbt
Technology Stack

Tools of the trade

From ingestion to transformation, orchestration to serving — the full modern data stack.

🐍Python
🗄️SQL
❄️Snowflake
☁️BigQuery
🧱Databricks
Apache Spark
🌬️Apache Airflow
📦Docker
🏗️Terraform
🔷Azure
🟠AWS
🔵GCP
Open Source

Featured Projects

End-to-end pipelines, ML models, and data infrastructure built in the open.

🌐
Multicloud Data Pipelines
Cross-cloud data pipelines spanning AWS, Azure, and GCP with Infrastructure as Code, CI/CD automation, and security best practices.
Terraform AWS Azure GCP
📈
Crypto Analytics Pipeline
End-to-end ETL pipeline for cryptocurrency market analysis using the CoinGecko API, PostgreSQL, and Docker with automated reporting.
Python PostgreSQL Docker ETL
🤖
AI Automation Risk — Jalisco
Master's thesis: predictive model analyzing AI-driven labor displacement risk across industries in Jalisco, Mexico (2025–2030) using PySpark and ML.
PySpark Machine Learning Research
📊
VBA Macros & Excel Automation
Collection of Excel macros for business task simplification, data reorganization, and automated report creation — bridging legacy workflows with modern data.
Python VBA Automation
What I do

Services

End-to-end data engineering — from raw ingestion to production-ready infrastructure.

🔁
Pipeline Design & Implementation
Build robust, scalable ETL/ELT pipelines using Python, Airflow, and Spark. From API ingestion to warehouse delivery.
❄️
Data Warehouse Setup
Design and implement data warehouses on Snowflake, BigQuery, or Databricks — schema design, optimization, and cost control.
🌐
Multicloud Architecture
Cross-cloud data infrastructure on AWS, Azure, and GCP with Terraform IaC, CI/CD pipelines, and security best practices.
🔍
Data Audits & Consulting
Review your current data stack, identify bottlenecks, reduce costs, and define a roadmap to a modern data architecture.
Case Studies

Problems solved

Real projects, real challenges, real outcomes.

01 Infrastructure
Unified data pipelines across 3 clouds
Challenge: data siloed in AWS, Azure, and GCP with no unified layer. Solution: cross-cloud pipelines with Terraform IaC and automated CI/CD. Result: single source of truth, reproducible infrastructure.
02 ETL · Analytics
Real-time crypto market analytics pipeline
Challenge: raw market data from CoinGecko API with no structure or history. Solution: automated ETL pipeline with PostgreSQL storage and Docker deployment. Result: queryable market history updated daily.
03 ML · Research
Predicting AI labor displacement risk in Jalisco
Challenge: quantify which jobs in Jalisco are most at risk from AI automation through 2030. Solution: PySpark ML pipeline on labor market microdata. Result: sector-level risk scores published as master's thesis.

Data engineering insights
& early access

Articles on pipelines, cloud architecture, and the future of data — plus early access when carpuro.ai launches its first product.

Let's build something remarkable

Interested in data engineering, consulting, or collaborating on a project? I'm open to conversations.

Get in touch →