Atlas is document intelligence for Australian tax records.

The problem we built it for is narrow and specific. The ATO publishes Integrated Client Account (ICA) statements as PDFs. These contain the running record of every BAS lodgement, payment, refund, penalty, and interest charge for a business. They are also one of the few authoritative cross references an accountant has against what their bookkeeping software says happened.

Extracting this data reliably is harder than it sounds. ICA documents have inconsistent layouts. Line items split across pages. Some fields are formulas, some are plain text, some appear only conditionally. The format has changed at least twice in recent memory. Generic PDF tools and standard OCR get most of the way there and then fail in ways that quietly poison downstream reconciliation.

Why not just use GPT

Two reasons. The first is accuracy. We need exact dollar amounts, exact dates, and exact transaction types, every time, with zero hallucination. General language models are not the right tool for precise structured extraction over documents that follow strict accounting conventions.

The second is cost and latency. A firm processing ICAs for hundreds of clients cannot pay token costs and wait on API round trips for every page of every document.

Atlas is a deep learning system built specifically for this. The model is trained on Australian tax documents. It knows what an ICA looks like, what fields it has, what the relationships between rows mean. When it encounters a layout it hasn’t seen, it flags the section for review rather than guessing.

Where it fits

Atlas feeds the Relay reconciliation engine. The structured ICA data becomes the authoritative record that Relay matches against the firm’s Xero data. Without reliable extraction, the rest of the engine has nothing to reconcile against.

Atlas is currently in training and evaluation. The next phase is integration into the Relay upload pipeline.