AWS InfrastructureServerless ArchitectureIaCPrivacy Engineering

ZoomZoom Cloud Inc

Cloud Consulting / Professional Services · Founder-led

Zero

Tracking technologies used

< $1/mo

Ongoing infrastructure cost

None

Servers to manage

Canada (ca-central-1)

Data residency

The Challenge

Every website needs analytics. The question is what kind, and at what cost — to the budget, to your infrastructure, and to the privacy of the people visiting your site.

The default answer in 2024 is to drop a JavaScript snippet from a third-party analytics provider into every page. It is fast to set up, and the dashboards look good in a demo. But the trade-offs are real. You are sending your visitors’ data to a third party whose business model depends on that data. You are adding a cookie banner, because you are legally required to. You are creating a cross-site tracking profile. You are trusting a vendor’s retention policy. You are paying a monthly subscription for something that is ultimately just counting HTTP requests.

For a cloud consultancy that advises clients on privacy-respecting infrastructure design, that approach is the wrong message entirely. The requirement was clear: understand how the site is being used, without compromising the privacy of the people using it.

Our Approach

We built a fully serverless analytics pipeline on top of the access logs that Amazon CloudFront already generates — logs that exist whether you look at them or not, contain no cookies, require no JavaScript on the client, and never leave your AWS account.

Architecture: CloudFront Standard Logging

CloudFront standard access logs record every HTTP request the CDN handles: the requested URL, the HTTP status code, bytes transferred, response time, the edge POP that served the request, the HTTP protocol version, and the referrer. No cookies. No tracking pixels. No client fingerprinting. The client’s IP address is included as a standard HTTP field — the same data that appears in every web server access log and has appeared in every access log since the early 1990s.

These logs land in Amazon S3 as gzip-compressed tab-separated files, named with a timestamp and the distribution ID. The raw delivery format is flat — all files in a single prefix — which makes them straightforward to receive but inefficient to query at scale.

Partition by Date: Lambda Log Mover

An AWS Lambda function is triggered automatically each time CloudFront delivers a new log file to S3. It reads the date from the filename, copies the file to a date-partitioned path structure (year=YYYY/month=MM/day=DD/), and deletes the original. This is a sub-second operation and costs a fraction of a cent per thousand invocations — effectively free at the traffic volumes of a consultancy website.

The partitioned structure is what makes Athena queries fast and cheap. Instead of scanning every log file ever written, a query for “last 7 days” only touches the relevant day-level partitions. The Glue Data Catalog describes the table schema and enables partition projection — Athena resolves the partition paths automatically, with no manual MSCK REPAIR TABLE maintenance required.

Query Without Infrastructure: Amazon Athena

Amazon Athena is a serverless query service. There are no clusters to provision, no query engines to manage, no always-on databases to pay for. You write standard SQL, Athena reads the relevant S3 objects, and you pay by the bytes scanned — approximately $5 per terabyte. At typical small-site log volumes, a weekly report query costs less than a cent.

The analytics pipeline runs 8 SQL queries each week, covering: top pages by hits, daily traffic trends, CloudFront cache performance, HTTP status code distribution, traffic sources by referrer, HTTP protocol adoption (HTTP/1.1, HTTP/2, HTTP/3), geographic distribution by edge POP with average latency, and a breakdown of all 4xx and 5xx errors. The results are assembled into an HTML email and delivered via Amazon SES.

Serverless log analytics pipeline: CloudFront access logs flow through S3, a Lambda partitioner, and Athena to produce a weekly HTML report delivered via SES

Automated Weekly Report: Lambda + EventBridge

A second Lambda function runs on a weekly schedule, triggered by an Amazon EventBridge cron rule every Monday at 08:00 UTC. It runs the 8 Athena queries sequentially, collects the results, and constructs a structured HTML email with one section per analysis. The email is sent via Amazon SES.

Everything — the S3 bucket, the Lambda functions, the EventBridge schedule, the Glue database and table schema, the Athena workgroup, and the SES identity — is defined in CloudFormation. There is no manual configuration. Deploying the stack creates the entire pipeline. Destroying the stack tears it all down cleanly.

Privacy Engineering: No Tracking by Design

The privacy properties of this architecture are not the result of configuration choices or vendor promises — they are structural.

There is no JavaScript on the client. No tracking pixel fires. No cookie is set or read. There is no unique identifier associated with any visitor. The analytics data is server-side HTTP access log data, collected by the infrastructure layer, not the application layer. Visitors who run ad blockers, disable JavaScript, or use privacy-focused browsers are measured identically to those who do not.

The IP address field in CloudFront logs is the one piece of information that qualifies as personally identifiable under PIPEDA and GDPR — it can, in principle, be used to identify an individual. S3 lifecycle rules automatically delete all log data after 90 days, which satisfies the data minimisation and retention limitation requirements of both frameworks. All data resides in AWS ca-central-1 (Canada), which satisfies PIPEDA’s data residency requirements and is compatible with GDPR’s adequacy framework.

There is no cookie banner on this site. There does not need to be. There are no cookies to consent to.

The Result

The analytics pipeline produces a weekly report that covers everything a consultancy website genuinely needs to know: which pages are attracting attention, where traffic is coming from, how the CDN cache is performing, which HTTP protocol versions clients are using, and where errors are occurring. It runs automatically, costs less than a dollar a month in total, and requires zero ongoing maintenance.

More importantly, it does this without compromising the people who visit the site. No third party receives data about them. No persistent identifier tracks them across sessions or across sites. Their IP address is deleted after 90 days. The architecture is compliant with PIPEDA, aligned with GDPR principles, and consistent with the intent of equivalent privacy legislation in other jurisdictions.

This is what privacy-respecting analytics looks like when you build it on infrastructure you own and control rather than on a SaaS vendor you are trusting to do the right thing. The technical complexity is low. The outcome is a system that works correctly, costs almost nothing, and carries no privacy debt.

For clients thinking about analytics, observability, or data residency in their own platforms — this is the approach we would recommend.

Similar challenges?

Get in touch to discuss your situation. The first call is free, and we'll give you a direct view of how we'd approach it.

Book a Discovery Call