Streaming Observability · Data Engineering
Stop Debugging
Kafka by Guessing
I build the streaming observability layer your infrastructure never had — OpenTelemetry, Prometheus, and Grafana deployed on your stack, so your team knows what's happening before your customers do.
Services
Fixed-scope. Known outcomes.
Productized engagements so you know exactly what you're getting — no runaway billing, no ambiguous deliverables.
Streaming Infrastructure
Streaming Visibility Audit
1-week assessment of your Kafka or Kinesis stack, written findings report, and live debrief.
$3,500 – $5,000
Streaming Visibility Buildout
OpenTelemetry instrumentation, Prometheus metrics, and Grafana dashboards deployed on your stack.
$20,000 – $45,000
Platform Foundation
Serverless AWS data platform built from scratch — Lambda, Kinesis, S3, Glue, and Snowflake.
$30,000 – $80,000
Advisory Retainer
Monthly architecture review, async support, and runbook review. Min. 3-month commitment.
$3,000 – $6,000/mo
AI / LLM Integration Observability
AI Integration Observability Audit
1-week assessment of your AI tool telemetry gaps, written report, and debrief.
$3,500 – $5,000
AI Integration Observability Buildout
OTEL instrumentation for AI tools, cost attribution, latency dashboards on AMP/AMG or self-hosted Prometheus + Grafana.
$15,000 – $35,000
Not sure where to start? The audit is the right first step for most teams.
Results
Before & after
Managed care platform · Kafka · Millions of members
Challenge
Kafka consumer lag spiked silently with no alerting. The team found out pipelines were broken when downstream processes failed — or when a stakeholder noticed. P1 incidents averaged 45+ minutes to detect.
Outcome
Deployed OpenTelemetry instrumentation across all producers and consumers, Prometheus recording rules, and Grafana dashboards with lag, throughput, and consumer group health panels. P1 detection dropped to under 4 minutes. Zero missed incidents in the first 90 days.
Series B platform on EKS · AI coding assistants · 80+ engineers
Challenge
The engineering org rolled out AI coding assistants and integrated LLMs into their product. No telemetry existed on adoption, latency, cost per team, or errors. Finance couldn't explain the API bill.
Outcome
Built OpenTelemetry instrumentation flowing through Amazon Managed Prometheus and Grafana on EKS. Delivered dashboards for adoption rates, p95 latency, and cost attribution by team. Two high-cost integration patterns identified and fixed within 30 days.
About
Mazen Abdelbasir
Senior Data & Cloud Engineer with 7+ years building production data infrastructure. I've spent the last several years solving the exact problems I now help clients fix — first at a managed care organization running Kafka pipelines at scale, then building full observability stacks for AI tool adoption on EKS.
Most engineering teams find out something broke when a customer complains. I build the visibility layer that changes that.
I work with Series B–D SaaS and mid-market engineering teams (30–300 engineers) running Kafka or Kinesis with no dedicated platform or observability team. I don't work in healthcare, health-tech, or insurance.
Production Kafka at enterprise scale
3+ years building pipelines that process data for millions of managed care members. Not tutorials. Not side projects.
Full observability stack, end to end
Built AI/LLM integration telemetry on EKS: OpenTelemetry → Amazon Managed Prometheus + self-hosted Prometheus → Amazon Managed Grafana + self-hosted Grafana. Both managed and self-hosted paths delivered.
AWS certified on both tracks
AWS Certified Solutions Architect (SAA-C03) + AWS Certified Data Engineer (DEA-C01). The architecture is sound — clients don't have to wonder.
Statistical rigor
MS Biomedical Sciences. Evidence-based diagnosis and validation — not gut feel — applied to how problems are scoped and solutions are measured.
Ready to see what your pipeline is doing?
Most teams I talk to already know something's wrong — they just don't have the data to prove it. Let's start there.