Aashish Khatri | Software Engineer

In 2009, Facebook engineers faced a crisis. Their social graph — the connections between 300 million users — was stored in MySQL. Every "friend request accepted" triggered cascading queries across sharded databases. Read latency was spiking. Cache invalidation was a nightmare. The infrastructure that had scaled Facebook from a dorm room to millions was now its biggest bottleneck.

The solution would become one of the most impressive feats in distributed systems engineering: TAO — The Associations and Objects store.

TAO handles 1 billion+ queries per second with sub-millisecond latency, serving a graph with 400+ billion edges across billions of objects.

This is the story of how Facebook built a database for the social graph of humanity.

The Problem: Why MySQL Couldn't Scale

The Memcache Band-Aid

By 2009, Facebook's architecture looked like this:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Web Tier  │────▶│  Memcached  │────▶│    MySQL    │
│  (PHP/Hack) │     │   (Cache)   │     │  (Sharded)  │
└─────────────┘     └─────────────┘     └─────────────┘

The pattern was "look-aside caching":

Check Memcached for data
If miss: query MySQL, populate cache
On write: update MySQL, invalidate cache

This worked at small scale. It collapsed at Facebook scale.

The Three Killers

1. Thundering Herds

When a cache key expired or was invalidated, hundreds of concurrent requests would simultaneously hit MySQL for the same data:

Cache Miss on popular user's friend list
    ├─▶ Request 1 → MySQL query
    ├─▶ Request 2 → MySQL query  
    ├─▶ Request 3 → MySQL query
    └─▶ ... 500 more requests → MySQL overwhelmed

2. Graph Query Complexity

Social graph queries are inherently multi-hop:

-- "Get friends of friends who liked this post"
SELECT DISTINCT u.* 
FROM users u
JOIN friendships f1 ON f1.friend_id = u.id
JOIN friendships f2 ON f2.user_id = f1.user_id
JOIN post_likes pl ON pl.user_id = u.id
WHERE f2.friend_id = @current_user AND pl.post_id = @post_id

This single query could touch millions of rows across dozens of shards.

3. Cross-Shard Consistency Hell

When Alice befriends Bob:

Write friendship record to Alice's shard
Write reverse friendship to Bob's shard
Invalidate cache on Alice's server
Invalidate cache on Bob's server
Invalidate "friends of Alice" cached lists
Invalidate "friends of Bob" cached lists

One operation triggered 6+ distributed writes and invalidations. Race conditions were inevitable.

The Insight: Model Everything as Objects and Associations

The key insight was that almost everything in Facebook could be represented as a graph:

Objects: Users, Posts, Photos, Pages, Comments
Associations: Friendships, Likes, Tags, Follows

Instead of SQL tables, TAO uses a simple graph model:

Object:
┌─────────────────────────────────────────────────┐
│  id (64-bit)  │  type  │  data (blob)  │  ver  │
│  4531234567   │ USER   │ {name:...}    │  42   │
└─────────────────────────────────────────────────┘

Association:
┌──────────────────────────────────────────────────────────┐
│  id1  │  atype  │  id2  │  timestamp  │  data            │
│  123  │  FRIEND │  456  │  1609459200 │  {since: 2021}   │
└──────────────────────────────────────────────────────────┘

The API Simplicity

TAO exposes just six core operations:

Operation	Description	Example
`obj_get(id)`	Get object by ID	Get user profile
`obj_set(id, data)`	Create/update object	Update profile
`assoc_add(id1, atype, id2, data)`	Create association	Add friend
`assoc_del(id1, atype, id2)`	Delete association	Unfriend
`assoc_get(id1, atype)`	Get all associations	Get friend list
`assoc_count(id1, atype)`	Count associations	Friend count

This minimal API handles 99% of Facebook's data access patterns.

The Architecture: Three Tiers of Intelligence

TAO's architecture is deceptively simple but engineered for planetary scale:

┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                  │
│   (GraphQL, Feed Service, Messenger, Search)                        │
└─────────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────────┐
│                       TAO LEADER CACHE                               │
│   - Handles ALL writes                                               │
│   - Owns cache invalidation                                          │
│   - One leader per shard                                             │
└─────────────────────────────────────────────────────────────────────┘
            │                                        │
┌───────────────────────┐              ┌───────────────────────┐
│   TAO Follower Cache  │              │   TAO Follower Cache  │
│   (Read replicas)     │              │   (Read replicas)     │
│   - Handles reads     │              │   - Handles reads     │
│   - Eventually        │              │   - Geographically    │
│     consistent        │              │     distributed       │
└───────────────────────┘              └───────────────────────┘
            │                                        │
┌─────────────────────────────────────────────────────────────────────┐
│                        MYSQL STORAGE                                 │
│   - Persistent storage                                               │
│   - Sharded by object_id                                             │
│   - 1000s of shards                                                  │
└─────────────────────────────────────────────────────────────────────┘

Why This Works

Leader Cache: ALL writes go through a single leader per shard. This eliminates cache consistency problems — the leader is the source of truth.

Follower Caches: Read replicas in every data center. They subscribe to the leader for invalidations.

MySQL: Still the durable storage, but TAO absorbs 99%+ of reads before they reach MySQL.

The Hard Problems (And How They Solved Them)

Problem 1: Thundering Herds

Solution: In-Flight Request Coalescing

When multiple requests ask for the same data simultaneously, TAO coalesces them:

Incoming: assoc_get(user:123, FRIEND)
  ├─▶ Request 1 ─┐
  ├─▶ Request 2 ─┼──▶ Single cache lookup / MySQL query
  ├─▶ Request 3 ─┘
  └─▶ All three get the same response

Implementation insight: TAO maintains a map of in-flight requests. New requests for the same key attach as "waiters" rather than triggering duplicate fetches.

Problem 2: Hot Spots (Celebrity Problem)

When Taylor Swift posts, millions fetch assoc_get(taylor_swift, LIKES) simultaneously. Even with coalescing, this overwhelms a single cache server.

Solution: Hierarchical Caching with Consistent Hashing

Taylor Swift's data:
  │
  ├─▶ Primary Cache Server (owner)
  │
  └─▶ N Secondary Cache Servers (read-only replicas)
      ├─▶ Server A (handles 20% of reads)
      ├─▶ Server B (handles 20% of reads)
      ├─▶ Server C (handles 20% of reads)
      ├─▶ Server D (handles 20% of reads)
      └─▶ Server E (handles 20% of reads)

Hot keys are automatically replicated across multiple cache tiers. The routing layer uses weighted random selection to spread load.

Problem 3: Association Lists (Pagination)

User with 5,000 friends — you can't load all 5,000 on every request.

Solution: Time-Ordered Association Lists

Associations are stored sorted by timestamp (descending). TAO provides:

assoc_range(id1, atype, offset, limit)

This returns associations in recency order with efficient pagination. The storage uses a B-tree index on (id1, atype, timestamp).

Problem 4: Bidirectional Edges

When Alice friends Bob:

Store (Alice, FRIEND, Bob) on Alice's shard
Store (Bob, FRIEND, Alice) on Bob's shard

These must be atomically consistent.

Solution: Two-Phase Write with Background Repair

Phase 1: Write to Alice's shard (primary)
Phase 2: Async write to Bob's shard (inverse)

If Phase 2 fails:
  - Background repair daemon detects missing inverse
  - Re-applies the write

TAO accepts brief inconsistency (seconds) for availability. Background repair heals any divergence.

The Numbers That Matter

From Facebook's 2013 TAO paper and subsequent engineering talks:

Metric	Value
Objects stored	Billions
Associations stored	400+ billion edges
Read QPS	1+ billion queries/second
Cache hit rate	99.8%
Read latency (p50)	under 1ms
Read latency (p99)	under 5ms
Write latency	under 10ms
Data centers	6+ global regions

Why 99.8% Cache Hit Rate?

Social graph access patterns are highly skewed:

80% of reads are for data written in the last week
Power-law distribution: 1% of objects get 50% of reads
Friends lists change infrequently (high cache TTL)

TAO exploits this with tiered caching and aggressive prefetching.

What Engineers Can Learn

1. Optimize for Your Access Patterns

TAO's 6-operation API isn't a limitation — it's a feature. By constraining the API, Facebook could optimize the hell out of those specific patterns.

Lesson: Generic solutions (like raw SQL) are flexible but slow. Constrained solutions (like TAO's API) can be orders of magnitude faster.

2. Embrace Eventual Consistency (Where Safe)

TAO doesn't guarantee immediate consistency across data centers. If you're in Singapore and your friend accepts a request in New York, you might not see it for a few seconds.

Lesson: Users rarely notice sub-second inconsistency. Trading strict consistency for availability and latency is often the right call for social applications.

3. Put Smart Caches in Front of Dumb Storage

MySQL is "dumb" storage in TAO — it just persists data. The TAO cache layer handles all the hard problems: coalescing, hot spots, invalidation.

Lesson: Separate your caching logic from your storage logic. Let each layer do what it's good at.

4. Design for Failure Recovery, Not Failure Prevention

TAO's two-phase writes can fail mid-way. Instead of complex distributed transactions, TAO uses background repair.

Lesson: In distributed systems, failures are inevitable. Design systems that heal themselves rather than trying to prevent all failures.

The Legacy: TAO's Influence

TAO's ideas have spread throughout the industry:

Google's Spanner: Similar association model for social features
LinkedIn's Espresso: Graph-aware caching inspired by TAO
Twitter's Manhattan: Adopted TAO's leader-follower cache pattern
Open Source: Projects like Dgraph and JanusGraph draw from TAO's design

Even if you never build at Facebook's scale, understanding TAO teaches you how to think about graph storage, caching hierarchies, and consistency trade-offs.

Conclusion

TAO is one of the most impressive pieces of infrastructure engineering in the history of computing. It transformed a collapsing MySQL-based system into a planet-scale graph database serving billions of users.

The key insight wasn't a single breakthrough — it was a series of practical engineering decisions:

Constrain the API to optimize for common patterns
Use hierarchical caching with smart invalidation
Accept eventual consistency where users won't notice
Build self-healing systems instead of failure-proof ones

For engineers, TAO is a masterclass in building systems that work at the edge of what's possible.

"Any sufficiently advanced caching is indistinguishable from a distributed database."
— Paraphrasing Arthur C. Clarke

TAO: How Facebook Built a Graph Database for 3 Billion People