What is GraphQL and How does Apollo's Cache Work?

Mar 15, 2024

In modern web and app development, efficiently fetching and managing data is paramount. Traditional RESTful APIs can face challenges like multiple requests, over-fetching, and under-fetching when dealing with complex data requirements. GraphQL was created to solve these challenges, and Apollo Client, with its built-in caching mechanism, provides front-end developers with an unparalleled toolset to master GraphQL.

What is GraphQL?

GraphQL is an API query language developed and open-sourced by Facebook, as well as a server-side runtime for executing those queries. It allows clients to specify the exact structure of the data they need, and the server returns a JSON object that precisely matches that structure.

Core Differences Between GraphQL and REST APIs

Feature	REST API	GraphQL
Endpoint	Typically has multiple endpoints (e.g., `/users`, `/posts/:id`)	Typically has a single endpoint (e.g., `/graphql`)
Data Fetching	Prone to Over-fetching or Under-fetching	Precise-fetching, you get exactly what you ask for
Number of Requests	May require multiple requests to fetch related data (e.g., fetching a post and its author)	Usually requires only one request to fetch all needed related data
Schema Definition	No standardized way to define the structure	Has a strongly-typed Schema that defines all the capabilities of the API

Apollo Client: More Than Just a Request Tool

If GraphQL is a query language, then Apollo Client is the best-practice framework for using that language on the client-side (e.g., in React, Vue, Angular, or native applications). It seamlessly integrates data fetching, state management, and UI updates.

Apollo Cache: The Frontend's "Single Source of Truth"

At the heart of Apollo Client is an advanced, normalized in-memory cache. This means that instead of just storing the entire JSON response for each query, it breaks down the response into individual objects, assigns each object a unique identifier, and stores them in a flattened key-value store.

This Unique Identifier is the foundation of all its automatic update functions. By default, it's composed of the object's __typename (the GraphQL type name) and its id (or _id) field. For example, a to-do item object would have a unique identifier like Todo:123.

Handling Cache Invalidation

Cache invalidation refers to the situation where the data in the cache is no longer in sync with the true data on the server. This is the most important and most necessary concept to understand when using Apollo Client.

Deep Dive: The Update Hierarchy and Ambiguity

To understand cache invalidation, you must first understand the three-tiered hierarchy of Apollo's cache updates:

Tier One: Object Identification This is the foundation of all automation. Apollo Client must be able to uniquely identify every object in the cache.
Tier Two: Object Update When a mutation returns an object that can be uniquely identified, Apollo Client can perform this level of automatic update. This is the highest level of automation it can achieve.
Tier Three: Collection Update This is the highest and most complex tier, requiring developer intervention. The result of a query (e.g., a list) is itself a record in the cache, but it stores references to the underlying objects. When you add or delete an item, this list of references must be modified.

The core of cache invalidation is that Apollo Client's automation can only handle up to Tier Two, but not Tier Three. The fundamental reason for this limitation is Ambiguity.

Ambiguity: Apollo Client Lacks "Context"

When you execute a mutation to add a new item, Apollo Client is smart enough to normalize the new item returned by the mutation and place it in the cache. For example, if you add a Todo:3, it will be properly stored.

However, this newly created Todo:3 doesn't know which to-do list it belongs to.

Imagine the various to-do lists that might exist in your application:

A list showing all to-do items (All Todos).
A list showing only incomplete to-do items (Incomplete Todos).
A list showing only to-do items due today (Today's Todos).
A list sorted by priority (Sorted by Priority).

When you add a new "incomplete," "due today" to-do item, should it be:

Added to the "All Todos" list? (Maybe)
Added to the "Incomplete Todos" list? (Maybe)
Added to the "Today's Todos" list? (Maybe)
Added to a list showing only completed items? (Definitely not)

Apollo Client cannot know this business logic. It only sees a new Todo object but has no idea which query conditions (filtering, sorting, pagination, etc.) it meets.

Specific Technical Scenarios

Let's look at a few specific scenarios to see why automatic updates would be dangerous and impractical:

Filtering: As in the example above, if your query is getTodos(status: 'INCOMPLETE'), Apollo cannot know if a new Todo object's status is 'INCOMPLETE' and therefore cannot decide whether to add it to this specific query result.
Sorting: Suppose your list is sorted alphabetically (ASC). A new item should be inserted somewhere in the middle of the list, not simply added to the beginning or end. Apollo doesn't know your sorting rules.
Pagination: This is the classic example. Suppose you display 10 items per page and are currently viewing page 2 (items 11-20). What should happen when you add a new item?
- Should it be added to the end of page 1?
- Should it be added to the end of the very last page?
- Will it cause the last item on page 1 to be "pushed" to the beginning of page 2?
- The client is completely unaware of the server's pagination logic. Any guess would almost certainly be wrong.

Because of this "ambiguity," Apollo Client hands the "decision-making power" back to the developer. The fields of a single object can be updated automatically because their identity is clear. However, the relationship between an object and a collection is ambiguous and filled with business logic, so it cannot be updated automatically and requires the developer to maintain this relationship manually.

Summary of Root Causes for Cache Invalidation

Object Lacks a Globally Unique Identifier (Tier One Failure): This is the most fundamental problem. If an object cannot be uniquely identified, Apollo cannot normalize it, and all subsequent automatic updates are impossible.
Mutation Doesn't Return Enough Information (Tier Two Failure): Even if an object has an ID, if the mutation doesn't return the updated object, Apollo has no new data to perform the update. This is the root cause of inconsistencies between the UI and backend data.
Change Affects a Collection (Tier Three Failure): Operations like adding, deleting, or sorting change a collection of query results. The return value of a mutation (a single new object) is insufficient for Apollo to infer how to modify these lists of references, which are full of business logic.

Strategies for Cache Invalidation

Understanding the principles above makes the following strategies logical. Their goal is to manually bridge the "gap" between the update tiers. Think of it like telling a librarian how to handle a new book:

The update Function (Manual Update - The Preferred Solution for Tier Three) Equivalent to: You give a clear command: "This new book is science fiction, please put it in section A of the sci-fi shelf." This is the most efficient method.
- How it works: a. Use cache.readQuery to read the existing list from the cache. b. Modify the list based on the mutation's result. c. Use cache.writeQuery to write the modified list back to the cache.
- Use Case: The vast majority of list operations (add, delete).
refetchQueries (Refetching Queries - The Brute-Force Solution for Tier Three) Equivalent to: You tell the librarian: "I don't know where this book goes, so please just reorganize the entire sci-fi shelf." This is more resource-intensive but very direct.
- How it works: After a mutation succeeds, specify one or more queries for Apollo Client to re-request from the server, overwriting the old query result with the latest data.
- Use Case: For non-critical operations where performance is not a high priority, or when the update logic is extremely complex.
Configuring typePolicies (Solving Tier One at the Root) This strategy fundamentally solves the "lacks a unique identifier" problem.
- How it works: When creating the InMemoryCache, use typePolicies to specify keyFields for types that don't have a standard id.
Fetch Policies Used to control the caching behavior of individual queries to ensure data freshness.
- cache-first (Default): Apollo Client checks the cache first. If the requested data is present, it's returned. If not, it sends a network request. This is great for data that doesn't change often.
- cache-and-network: This policy first returns data from the cache (for a fast UI response), then also sends a network request. When the fresh data arrives, it updates the UI again. This is ideal for data that needs to be up-to-date, but where an instant UI is also important.
- network-only: Always sends a network request and ignores the cache for reading. Use this for critical data that must always be the latest from the server (e.g., a one-time password).
- cache-only: Only ever reads from the cache. It will throw an error if the data isn't in the cache. This is useful for when you are certain the data should already be available locally.
- no-cache: This is similar to network-only, but it also doesn't write the response data to the cache. This is useful for sensitive data that you do not want to store on the client.
Global Invalidation In some cases, like a user logging out, you need to invalidate the entire cache.
- client.resetStore(): Clears the current cache and re-executes all active queries.

Conclusion

GraphQL provides a more efficient, powerful, and flexible way to interact with APIs. Apollo Client takes GraphQL's capabilities to the next level, acting not just as a request library but as a comprehensive client-side state management solution.

Its core, the Apollo Cache, creates a reliable, consistent, and efficient "single source of truth" for your application through normalization. Truly mastering Apollo Client lies in deeply understanding its update hierarchy, knowing why invalidation occurs, and learning to choose the most appropriate strategy to handle updates at different tiers. Only then can you build modern applications with lightning-fast response times and an excellent user experience.

anila.