Entity Identifiers
Every Formation entity carries two identifiers: the integer primary key stored in the database (SchemeId, AddressId, CompanyId, …), and the encoded Id string exposed by the API ("SC1b2Cd", "AD03KwA", …). Clients — the frontend, API consumers, anyone reading URLs — only ever see the encoded form. The integer is an internal detail.
This page explains what the encoded Id is, why Formation uses it, and the rules the rest of the stack follows because of it.
Table of Contents
Section titled “Table of Contents”- The Two-ID Pattern
- Why Not Just Expose the Integer PK?
- Why Not GUIDs?
- How the Encoding Works
- Consequences for API Design
- Consequences for the Frontend
- Gotchas
The Two-ID Pattern
Section titled “The Two-ID Pattern”BaseEntity declares both halves:
public abstract class BaseEntity{ [NotMapped] public string Id => EncodeIdentifier(DbId, GetType().Name);
[NotMapped] public abstract int DbId { get; }}Every concrete entity implements DbId by returning its underlying integer PK:
public partial class Scheme : BaseEntity{ public override int DbId => SchemeId; public int SchemeId { get; set; } …}The [NotMapped] attributes mean neither is stored as a column. The database has only SchemeId (int identity) as the actual primary key. DbId is a runtime adapter; Id is a computed string derived from DbId plus the entity type name.
Result:
| Layer | What it uses |
|---|---|
| Database | Integer PKs / FKs (SchemeId, AddressId, …) |
| API requests | Id string in URLs, JSON payloads, patch paths |
| API responses | Id string on every entity; FKs replaced with nested { Id } objects |
| Frontend | Id exclusively — never touches the integer |
Why Not Just Expose the Integer PK?
Section titled “Why Not Just Expose the Integer PK?”Four concrete reasons, most-important first:
1. IDs are not enumerable
Section titled “1. IDs are not enumerable”Integer PKs leak information about the dataset:
GET /Schemes/1followed by/Schemes/2,/Schemes/3walks the entire scheme table.- Creation rate is observable — if
POST /Schemesreturns id 1000 today and 1500 next week, the tenant’s activity level is now public. - Total count is observable —
GET /Schemes/<largest-known-id + 1>returning 404 tells you the table is exactly that size.
Every Formation user is authenticated (AAD) and trusted, so the enumeration attack surface is smaller than a public API, but the same information leakage applies to screen-shares, screenshots, and casual URL-sharing. Encoded IDs give nothing away — "SC1b2Cd" and "SCm4Kp1" don’t reveal creation order, they don’t tell you the entity next door, and they’re not guessable.
2. URLs are shareable without embedding low-level schema
Section titled “2. URLs are shareable without embedding low-level schema”Integer PKs couple every public URL to the physical database schema. A migration that needs to renumber — say, moving from an identity column to a sequence, merging tenants, or reseeding after a rollback — breaks every bookmarked URL. Encoded IDs are computed from the integer plus the type name, so the same underlying row ends up with the same encoded ID as long as both stay stable. Changes to the encoding format are a one-time disruption (all URLs change); changes to the data are invisible.
3. Type tagging prevents cross-entity confusion
Section titled “3. Type tagging prevents cross-entity confusion”The first two characters of the encoded ID come from the entity type name (SC for Scheme, AD for Address, CO for Company). If a client accidentally plugs a scheme ID into an /Addresses/… route, the decode produces a nonsense integer rather than silently hitting a different entity’s row. Integer PKs have no such safety — 1 in the addresses table and 1 in the schemes table look the same.
4. Clients don’t need both forms
Section titled “4. Clients don’t need both forms”Exposing both SchemeId: 42 and Id: "SC1b2Cd" on every response would create a fork: some code uses one, other code uses the other, they drift. By making the integer internal-only, the frontend has a single source of truth. Patch paths, routing, foreign-key references in nested objects — all use the encoded form.
Why Not GUIDs?
Section titled “Why Not GUIDs?”GUIDs solve enumeration but cost elsewhere:
- Opaque and long.
7c5cb85e-4b36-4e8b-91be-7cff8fbd4223is 36 characters."SC1b2Cd"is 6. In URLs, patches, and log lines, the short form is dramatically more usable. - No type tag. A GUID doesn’t tell you what kind of entity it addresses. The two-character prefix on the encoded ID gives humans and tools a fighting chance.
- Index bloat. GUID primary keys fragment SQL Server’s clustered index unless you use sequential GUIDs, which reintroduces enumeration risk. Keeping the integer PK avoids that trade-off entirely — the encoded ID is a pure rendering concern.
- FK storage.
SchemeId INTis 4 bytes; a GUID FK is 16. Across millions of rows of join tables (SchemeCompany,InvestmentCompany, …) that’s a real cost.
The encoded ID scheme gives the benefits of opaque IDs without the storage or indexing cost — the database stays on integers.
How the Encoding Works
Section titled “How the Encoding Works”All the logic lives in BaseEntity.cs:
public static string EncodeIdentifier(int id, string typeName){ var zid = ZeroIdentifier(typeName); int xorResult = zid ^ id;
byte[] bytes = new byte[4]; BinaryPrimitives.WriteInt32BigEndian(bytes, xorResult);
string base64 = Convert.ToBase64String(bytes);
return base64.Replace('+', '-').Replace('/', '_');}
private static int ZeroIdentifier(string className){ using var sha256 = System.Security.Cryptography.SHA256.Create(); var bytes = System.Text.Encoding.UTF8.GetBytes(className); var hash = sha256.ComputeHash(bytes); return BitConverter.ToInt32(hash, 0);}The steps:
- Take the entity type name (e.g.
"Scheme"). - SHA-256 hash it; take the first 4 bytes as a 32-bit integer. Call this the zero-point for that type.
- XOR the zero-point with the database primary key.
- Encode the 4-byte result in base64, URL-safe (
+/become-_). - Trim — the base64 form is 8 chars including padding; drop the padding, keep 6 characters.
Decoding is the inverse:
public static int DecodeIdentifier<T>(string encodedId) where T : BaseEntity{ var zid = ZeroIdentifier(typeof(T).Name); return zid ^ EncodedToInt(encodedId);}XOR is its own inverse, so applying the same zero-point twice recovers the original integer. DecodeIdentifier is called by every write controller’s LoadEntityAsync and by TryApplyEncodedIdFilter on the read path.
Properties of this scheme:
- Deterministic. Same type + same
DbIdalways produces the sameId. Stable bookmarks. - Type-scoped. A
SchemewithDbId=1and anAddresswithDbId=1produce different encoded IDs because the zero-points are different. - No collisions between types. SHA-256 of
"Scheme"and"Address"differs, so their zero-points differ, so their encoded ID spaces don’t overlap. - Cheap. One hash per type (cacheable in principle; currently recomputed per call, but SHA-256 of a short string is microseconds).
- Not a hash, not encryption. It’s obfuscation. Anyone with the source — open to any authenticated Formation user — can decode any ID. That’s fine: the goal is to stop accidental enumeration and schema leakage, not to protect secrets.
The two-character prefix (SC, AD, CO, …) is visible in the first 1–2 chars of the encoded output as a natural consequence of the zero-point being different per type, but it isn’t computed from the type name letters — it’s an emergent property of the hash. If you rename an entity type, every encoded ID for that type changes.
Consequences for API Design
Section titled “Consequences for API Design”-
URLs are always encoded.
GET /Schemes/SC1b2Cd, never/Schemes/42. Controllers decode on entry viaBaseEntity.DecodeIdentifier<Scheme>("SC1b2Cd"). -
Foreign keys in responses are rendered as nested
{ Id }objects, not as raw integers:{"Id": "SC1b2Cd","SchemeName": "…","Address": { "Id": "AD03KwA", "AddressLine": "…" },"BuildingType": { "Id": "BT5pQz2" }}The
AddressIdFK column exists in the DB but is invisible in the API. JSON Patch uses a rewrite layer to translate/Address/Idpatches back intoAddressIdFK updates. -
$filter=Id eq '<encoded>'needs special handling. Id is[NotMapped]; EF can’t translate it to SQL.TryApplyEncodedIdFilter(controller pattern → Encoded-Id Filtering) intercepts the filter before OData sees it, decodes the RHS, and rewrites toWHERE SchemeId = @decoded. -
$expand populates nested
Idproperties. BecauseIdis a computed property and the navigation target is a tracked entity, expanded relations get anIdstring for free — no extra projection step.
Consequences for the Frontend
Section titled “Consequences for the Frontend”The frontend rule in CLAUDE.md: always use the encoded Id, never the database FK field.
// ✅ Correctconst id = entity.Id // "SC1b2Cd"navigate(`/schemes/${entity.Id}`)
// ❌ Wrong — internal DB FK, not part of the API contractconst id = entity.AddressId // 42navigate(`/addresses/${entity.AddressId}`)The reason AddressId even appears on the frontend type at all is that TypeScript types are generated from EF Core entity shapes, which include all columns. The rule is enforced by code review rather than type system; the lint rule no-database-fk could be added in future.
Patch diffs include the rewrite machinery:
// Frontend calculates the diff as nested object paths:buildPatch(original, updated) → [{ op: "replace", path: "/Address/Id", value: "AD07Bzq" }]
// Backend's PatchRewriter transforms before applying to EF:[{ op: "replace", path: "/AddressId", value: 42 }]The frontend never sees the FK integer; the backend never sees the encoded string once it’s past the write controller.
Gotchas
Section titled “Gotchas”-
Don’t log
DbId. Logs are sometimes shared across environments. EmittingDbIdleaks the enumeration you worked to hide. LogIdinstead (a one-line rule that several commits fix). -
Don’t URL-encode the encoded ID. Because the base64-URL variant produces only
A–Z,a–z,0–9,-,_, the encoded ID is already URL-safe. AdecodeURIComponentround-trip is a no-op in practice, but some helpers still encode defensively — harmless but unnecessary. -
Tests need decoded integer IDs for
.FindAsync(id)calls. EF’s.FindAsync(key)wants the real PK. Test setup should use the integer seeded into the DB, not the encoded form. Write controllers decode automatically; test helpers may not. -
Renaming an entity class breaks every existing URL for that type. The zero-point is derived from the type name. If
SchemebecomesDevelopmenttomorrow, every scheme’s encoded ID changes because the hash input changed. This is acceptable trade-off for the stability it gives in every other case, but plan the migration — redirect the old URLs, or preserve the encoding function by keeping the old class name as a string override. -
Don’t expect the encoded ID to be collision-free across all possible integer PKs. The output is 4 bytes of entropy. Two different
DbIdvalues never collide within a single entity type (XOR is a bijection), but this is a cryptographically weak obfuscation — it’s not a hash. -
Don’t use
Idas a dictionary key in hot loops. It’s a property getter that recomputes SHA-256 of the type name every call. For tight loops, snapshot the integer PK or the encoded string once. The API-boundary cost is negligible; an inner loop might notice. -
DbIdis abstract and must be overridden in every concrete entity. Missing it produces a runtime error on first access, not a compile error (becauseDbIdis declared onBaseEntityand not on the derived class’s shape). EF Core’s model configuration catches most of these during migrations.
See also
Section titled “See also”- Controller pattern → Encoded-Id Filtering — how
$filter=Id eq …is intercepted - JSON Patch — how nested
/Address/Idpaths become FK updates - Architecture → Identity Model — top-level overview