Engineering Excellence: Our Sydney Core Network
When we set out to build The IT Dept's network, we had one goal: create infrastructure that we'd want to buy from ourselves. That meant no compromises on redundancy, no cutting corners on hardware, and no pretending that complexity doesn't exist.
Today, we're pulling back the curtain on exactly how our Sydney core network operates.
Physical Architecture
Dual Points of Presence
Our Sydney infrastructure spans two Tier III+ data centers:
NextDC S1 (Macquarie Park)
- Primary core location
- Dual Juniper MX204 routers in active/active configuration
- Redundant 100G fiber paths between sites
- Direct connectivity to EdgeIX
- Multiple provider cross-connects
Equinix SY1 (Sydney CBD)
- Secondary core location (geo-redundant)
- Dual Juniper MX204 routers mirroring S1 topology
- Redundant 100G fiber paths for transport diversity
- Direct connectivity to IX Australia
- Independent upstream providers for path diversity
The physical separation between these sites (approximately 15km) ensures that localized incidents - whether building evacuations, power grid failures, or fiber cuts in a specific metro area - cannot take down our entire network.
Hardware Selection: Why Juniper MX204
The MX204 is purpose-built for carrier edge and aggregation, and it's exactly what we need:
Specifications:
- 4x 100GE QSFP28 ports (expandable)
- 2.4 Tbps switching capacity
- Full Junos feature set (BGP, MPLS, EVPN, VXLAN)
- Hardware-accelerated encryption and QoS
- Hot-swappable power supplies and fan trays
Why This Platform:
- Proven at Scale: The MX series powers some of the largest carrier networks globally
- Full BGP Table Support: Critical for IP Transit customers who need the full Internet routing table
- VXLAN in Hardware: Overlay switching at line rate without CPU penalty
- Future-Proof: Our current utilization is under 30%, giving us years of growth headroom
We run these routers in pairs at each site. Both routers are active, handling production traffic simultaneously using ECMP (Equal-Cost Multi-Path) routing. If one router fails, the other seamlessly absorbs the load without customer impact.
The VXLAN Overlay: Why We Chose It
Provisioning services across two physical data centers, multiple upstream providers, and diverse customer handoff locations presents a challenge: how do you make it simple?
Enter VXLAN (Virtual Extensible LAN).
What VXLAN Gives Us
Layer 2 Extension Over Layer 3 VXLAN lets us create logical layer-2 networks that span physical locations. A customer's NBN service at SY1 can seamlessly extend to S1, or vice versa, without them knowing (or caring) about the underlying complexity.
Simplified Provisioning When we onboard a new customer:
- We create a VNI (VXLAN Network Identifier) - their isolated network
- We map their VLAN to that VNI
- Traffic flows across our core automatically, regardless of which site it enters
No manual routing updates. No complex policy changes. The overlay handles it.
Traffic Engineering Flexibility Because VXLAN encapsulates customer traffic, we can steer it across our network based on real-time conditions:
- Load balancing across diverse fiber paths
- Path failover in milliseconds if a link degrades
- QoS policies that follow the traffic, not the physical port
Security Isolation Each customer gets their own VNI. Their traffic is cryptographically isolated from other customers, even if it's traversing the same physical 100G link. It's the network equivalent of encrypted tunnels - without the performance penalty.
Redundant Fiber Paths
Our two sites are interconnected via diverse dark fiber paths:
- Path A: Northern metro route (via Chatswood)
- Path B: Western metro route (via Parramatta)
These paths are physically separated - different conduits, different street infrastructure, different risk profiles. A construction crew cutting a fiber in Lane Cove can't impact a path running through Silverwater.
We light both paths with 100G coherent optics, providing us with 200G of inter-site capacity. Traffic load-balances across both links during normal operations. If one path fails, the other absorbs the full load without customer-visible impact.
Upstream Connectivity
We maintain diverse upstream transit providers from each location:
From S1:
- Provider A: 100G to Tier-1 carrier (US-heavy routes)
- Provider B: 100G to regional carrier (APAC-optimized)
- Leaptel: Redundant connections for NBN aggregation
- EdgeIX: 100G peering port
From SY1:
- Provider C: 100G to Tier-1 carrier (diverse from S1)
- Provider D: 100G to content-heavy transit
- Leaptel: Redundant connections for NBN aggregation (geo-diverse)
- IX Australia: 100G peering port
This multi-homing strategy ensures:
- No single vendor dependency - if one provider has issues, we route around them
- Optimized latency - APAC-bound traffic takes the most direct path
- Peering benefits - IX connections give us single-hop access to major content providers (Google, Cloudflare, AWS, etc.)
NBN Aggregation Architecture
For our Wholesale NBN customers, the Juniper MX204 routers handle Layer 2 aggregation using QinQ (802.1ad):
- Customer NBN services arrive at our NNI (Network-Network Interface) with dual VLAN tags
- The outer tag identifies the service type (e.g., nbn-100-20)
- The inner tag is customer-specific
- We map these into VXLAN VNIs for transport across our core
- Handoff to customers is a clean Layer 2 circuit via Megaport, EdgeIX, or physical cross-connect
This approach means customers get full control of their Layer 2 network - they can run their own DHCP, their own routing, their own VLANs. We're just the pipe.
Monitoring & Automation
A network this complex requires serious observability:
Real-Time Metrics:
- SNMP polling every 30 seconds for interface stats
- Streaming telemetry for BGP session state, VXLAN tunnel health, and optical power levels
- Synthetic traffic injection to measure real-world latency and packet loss
Alerting:
- Thresholds tuned per-link (a 10G customer link has different tolerances than a 100G IX port)
- Escalation to on-call engineers within 60 seconds of detection
- Automated ticket creation in our NOC system
Provisioning Automation:
- RESTful API for service turn-up (used by our customer portal)
- Ansible playbooks for configuration deployment
- Git-backed config versioning with automatic rollback on failure
Why This Matters For Customers
If you're reading this thinking "that's overkill for my 10G handoff" - that's exactly the point.
We built infrastructure that can survive:
- A complete site failure
- A fiber cut
- A provider going dark
- A router catching fire
And in all those scenarios, your traffic keeps flowing.
This is what carrier-grade means. Not marketing fluff, but actual redundancy at every layer: hardware, software, connectivity, and physical location.
Performance In Practice
Since going live, our Sydney core has maintained:
- 99.99% uptime (measured per-customer circuit)
- Sub-2ms inter-site latency (S1 ↔ SY1)
- Zero packet loss during normal operations
- < 50ms failover time during simulated failures
These aren't theoretical numbers. They're measured, logged, and reported.
What's Next
We're currently engineering our Melbourne expansion, which will follow the same architecture principles:
- Dual sites (NextDC M1 and Equinix ME1)
- Geo-redundant dark fiber
- Diverse upstream providers
- VXLAN overlay across all cities
When it launches, a customer will be able to order a single Layer 2 circuit that spans Sydney and Melbourne, with automatic failover and traffic engineering handled transparently.
Technical Specifications Summary
| Component | Specification |
|---|---|
| Router Model | Juniper MX204 |
| Routers per Site | 2 (active/active) |
| Total Core Routers | 4 (2x S1, 2x SY1) |
| Inter-Site Fiber | 2x 100G diverse paths |
| Upstream Capacity | 400G+ aggregated |
| Overlay Technology | VXLAN (hardware accelerated) |
| BGP ASN | AS152590 |
| Peering Locations | EdgeIX, IX Australia |
| NBN Aggregation | Leaptel (redundant connections) |
Transparency Matters
Other providers will tell you they have "redundant network architecture." We're showing you the actual hardware, the actual topology, and the actual failure domains.
If you have questions about how your traffic would route, what happens during a failure scenario, or why we made specific design decisions - call us. Our NOC engineers love talking about this stuff.
02 4398 7089 | hello@theitdept.au
Because infrastructure shouldn't be a mystery.
