Streaming Observability · Data Engineering

Stop Debugging Kafka by Guessing

I build the streaming observability layer your infrastructure never had — OpenTelemetry, Prometheus, and Grafana deployed on your stack, so your team knows what's happening before your customers do.

AWS Certified Solutions Architect (SAA-C03) AWS Certified Data Engineer (DEA-C01) 3+ yrs Kafka @ enterprise scale OpenTelemetry · Prometheus · Grafana

Services

Fixed-scope. Known outcomes.

Productized engagements so you know exactly what you're getting — no runaway billing, no ambiguous deliverables.

Streaming Infrastructure

Start here

Streaming Visibility Audit

1-week assessment of your Kafka or Kinesis stack, written findings report, and live debrief.

$3,500 – $5,000

Most requested

Streaming Visibility Buildout

OpenTelemetry instrumentation, Prometheus metrics, and Grafana dashboards deployed on your stack.

$20,000 – $45,000

End-to-end

Platform Foundation

Serverless AWS data platform built from scratch — Lambda, Kinesis, S3, Glue, and Snowflake.

$30,000 – $80,000

Ongoing

Advisory Retainer

Monthly architecture review, async support, and runbook review. Min. 3-month commitment.

$3,000 – $6,000/mo

AI / LLM Integration Observability

Start here

AI Integration Observability Audit

1-week assessment of your AI tool telemetry gaps, written report, and debrief.

$3,500 – $5,000

Full stack

AI Integration Observability Buildout

OTEL instrumentation for AI tools, cost attribution, latency dashboards on AMP/AMG or self-hosted Prometheus + Grafana.

$15,000 – $35,000

Not sure where to start? The audit is the right first step for most teams.

Results

Before & after

45 min → 4 min incident detection time

Managed care platform · Kafka · Millions of members

Challenge

Kafka consumer lag spiked silently with no alerting. The team found out pipelines were broken when downstream processes failed — or when a stakeholder noticed. P1 incidents averaged 45+ minutes to detect.

Outcome

Deployed OpenTelemetry instrumentation across all producers and consumers, Prometheus recording rules, and Grafana dashboards with lag, throughput, and consumer group health panels. P1 detection dropped to under 4 minutes. Zero missed incidents in the first 90 days.

< 1 week to full AI tool visibility

Series B platform on EKS · AI coding assistants · 80+ engineers

Challenge

The engineering org rolled out AI coding assistants and integrated LLMs into their product. No telemetry existed on adoption, latency, cost per team, or errors. Finance couldn't explain the API bill.

Outcome

Built OpenTelemetry instrumentation flowing through Amazon Managed Prometheus and Grafana on EKS. Delivered dashboards for adoption rates, p95 latency, and cost attribution by team. Two high-cost integration patterns identified and fixed within 30 days.

About

Mazen Abdelbasir

Senior Data & Cloud Engineer with 7+ years building production data infrastructure. I've spent the last several years solving the exact problems I now help clients fix — first at a managed care organization running Kafka pipelines at scale, then building full observability stacks for AI tool adoption on EKS.

Most engineering teams find out something broke when a customer complains. I build the visibility layer that changes that.

I work with Series B–D SaaS and mid-market engineering teams (30–300 engineers) running Kafka or Kinesis with no dedicated platform or observability team. I don't work in healthcare, health-tech, or insurance.

Go Python Kafka Kinesis OpenTelemetry Prometheus Grafana AWS Kubernetes Terraform Snowflake

Production Kafka at enterprise scale

3+ years building pipelines that process data for millions of managed care members. Not tutorials. Not side projects.

Full observability stack, end to end

Built AI/LLM integration telemetry on EKS: OpenTelemetry → Amazon Managed Prometheus + self-hosted Prometheus → Amazon Managed Grafana + self-hosted Grafana. Both managed and self-hosted paths delivered.

AWS certified on both tracks

AWS Certified Solutions Architect (SAA-C03) + AWS Certified Data Engineer (DEA-C01). The architecture is sound — clients don't have to wonder.

Statistical rigor

MS Biomedical Sciences. Evidence-based diagnosis and validation — not gut feel — applied to how problems are scoped and solutions are measured.

Ready to see what your pipeline is doing?

Most teams I talk to already know something's wrong — they just don't have the data to prove it. Let's start there.