Skip to content

Entity Identifiers

Every Formation entity carries two identifiers: the integer primary key stored in the database (SchemeId, AddressId, CompanyId, …), and the encoded Id string exposed by the API ("SC1b2Cd", "AD03KwA", …). Clients — the frontend, API consumers, anyone reading URLs — only ever see the encoded form. The integer is an internal detail.

This page explains what the encoded Id is, why Formation uses it, and the rules the rest of the stack follows because of it.

BaseEntity declares both halves:

src/common/models/Models/BaseEntity.cs
public abstract class BaseEntity
{
[NotMapped]
public string Id => EncodeIdentifier(DbId, GetType().Name);
[NotMapped]
public abstract int DbId { get; }
}

Every concrete entity implements DbId by returning its underlying integer PK:

public partial class Scheme : BaseEntity
{
public override int DbId => SchemeId;
public int SchemeId { get; set; }
}

The [NotMapped] attributes mean neither is stored as a column. The database has only SchemeId (int identity) as the actual primary key. DbId is a runtime adapter; Id is a computed string derived from DbId plus the entity type name.

Result:

LayerWhat it uses
DatabaseInteger PKs / FKs (SchemeId, AddressId, …)
API requestsId string in URLs, JSON payloads, patch paths
API responsesId string on every entity; FKs replaced with nested { Id } objects
FrontendId exclusively — never touches the integer

Four concrete reasons, most-important first:

Integer PKs leak information about the dataset:

  • GET /Schemes/1 followed by /Schemes/2, /Schemes/3 walks the entire scheme table.
  • Creation rate is observable — if POST /Schemes returns id 1000 today and 1500 next week, the tenant’s activity level is now public.
  • Total count is observable — GET /Schemes/<largest-known-id + 1> returning 404 tells you the table is exactly that size.

Every Formation user is authenticated (AAD) and trusted, so the enumeration attack surface is smaller than a public API, but the same information leakage applies to screen-shares, screenshots, and casual URL-sharing. Encoded IDs give nothing away — "SC1b2Cd" and "SCm4Kp1" don’t reveal creation order, they don’t tell you the entity next door, and they’re not guessable.

2. URLs are shareable without embedding low-level schema

Section titled “2. URLs are shareable without embedding low-level schema”

Integer PKs couple every public URL to the physical database schema. A migration that needs to renumber — say, moving from an identity column to a sequence, merging tenants, or reseeding after a rollback — breaks every bookmarked URL. Encoded IDs are computed from the integer plus the type name, so the same underlying row ends up with the same encoded ID as long as both stay stable. Changes to the encoding format are a one-time disruption (all URLs change); changes to the data are invisible.

3. Type tagging prevents cross-entity confusion

Section titled “3. Type tagging prevents cross-entity confusion”

The first two characters of the encoded ID come from the entity type name (SC for Scheme, AD for Address, CO for Company). If a client accidentally plugs a scheme ID into an /Addresses/… route, the decode produces a nonsense integer rather than silently hitting a different entity’s row. Integer PKs have no such safety — 1 in the addresses table and 1 in the schemes table look the same.

Exposing both SchemeId: 42 and Id: "SC1b2Cd" on every response would create a fork: some code uses one, other code uses the other, they drift. By making the integer internal-only, the frontend has a single source of truth. Patch paths, routing, foreign-key references in nested objects — all use the encoded form.

GUIDs solve enumeration but cost elsewhere:

  • Opaque and long. 7c5cb85e-4b36-4e8b-91be-7cff8fbd4223 is 36 characters. "SC1b2Cd" is 6. In URLs, patches, and log lines, the short form is dramatically more usable.
  • No type tag. A GUID doesn’t tell you what kind of entity it addresses. The two-character prefix on the encoded ID gives humans and tools a fighting chance.
  • Index bloat. GUID primary keys fragment SQL Server’s clustered index unless you use sequential GUIDs, which reintroduces enumeration risk. Keeping the integer PK avoids that trade-off entirely — the encoded ID is a pure rendering concern.
  • FK storage. SchemeId INT is 4 bytes; a GUID FK is 16. Across millions of rows of join tables (SchemeCompany, InvestmentCompany, …) that’s a real cost.

The encoded ID scheme gives the benefits of opaque IDs without the storage or indexing cost — the database stays on integers.

All the logic lives in BaseEntity.cs:

public static string EncodeIdentifier(int id, string typeName)
{
var zid = ZeroIdentifier(typeName);
int xorResult = zid ^ id;
byte[] bytes = new byte[4];
BinaryPrimitives.WriteInt32BigEndian(bytes, xorResult);
string base64 = Convert.ToBase64String(bytes);
return base64.Replace('+', '-').Replace('/', '_');
}
private static int ZeroIdentifier(string className)
{
using var sha256 = System.Security.Cryptography.SHA256.Create();
var bytes = System.Text.Encoding.UTF8.GetBytes(className);
var hash = sha256.ComputeHash(bytes);
return BitConverter.ToInt32(hash, 0);
}

The steps:

  1. Take the entity type name (e.g. "Scheme").
  2. SHA-256 hash it; take the first 4 bytes as a 32-bit integer. Call this the zero-point for that type.
  3. XOR the zero-point with the database primary key.
  4. Encode the 4-byte result in base64, URL-safe (+/ become -_).
  5. Trim — the base64 form is 8 chars including padding; drop the padding, keep 6 characters.

Decoding is the inverse:

public static int DecodeIdentifier<T>(string encodedId) where T : BaseEntity
{
var zid = ZeroIdentifier(typeof(T).Name);
return zid ^ EncodedToInt(encodedId);
}

XOR is its own inverse, so applying the same zero-point twice recovers the original integer. DecodeIdentifier is called by every write controller’s LoadEntityAsync and by TryApplyEncodedIdFilter on the read path.

Properties of this scheme:

  • Deterministic. Same type + same DbId always produces the same Id. Stable bookmarks.
  • Type-scoped. A Scheme with DbId=1 and an Address with DbId=1 produce different encoded IDs because the zero-points are different.
  • No collisions between types. SHA-256 of "Scheme" and "Address" differs, so their zero-points differ, so their encoded ID spaces don’t overlap.
  • Cheap. One hash per type (cacheable in principle; currently recomputed per call, but SHA-256 of a short string is microseconds).
  • Not a hash, not encryption. It’s obfuscation. Anyone with the source — open to any authenticated Formation user — can decode any ID. That’s fine: the goal is to stop accidental enumeration and schema leakage, not to protect secrets.

The two-character prefix (SC, AD, CO, …) is visible in the first 1–2 chars of the encoded output as a natural consequence of the zero-point being different per type, but it isn’t computed from the type name letters — it’s an emergent property of the hash. If you rename an entity type, every encoded ID for that type changes.

  • URLs are always encoded. GET /Schemes/SC1b2Cd, never /Schemes/42. Controllers decode on entry via BaseEntity.DecodeIdentifier<Scheme>("SC1b2Cd").

  • Foreign keys in responses are rendered as nested { Id } objects, not as raw integers:

    {
    "Id": "SC1b2Cd",
    "SchemeName": "",
    "Address": { "Id": "AD03KwA", "AddressLine": "" },
    "BuildingType": { "Id": "BT5pQz2" }
    }

    The AddressId FK column exists in the DB but is invisible in the API. JSON Patch uses a rewrite layer to translate /Address/Id patches back into AddressId FK updates.

  • $filter=Id eq '<encoded>' needs special handling. Id is [NotMapped]; EF can’t translate it to SQL. TryApplyEncodedIdFilter (controller pattern → Encoded-Id Filtering) intercepts the filter before OData sees it, decodes the RHS, and rewrites to WHERE SchemeId = @decoded.

  • $expand populates nested Id properties. Because Id is a computed property and the navigation target is a tracked entity, expanded relations get an Id string for free — no extra projection step.

The frontend rule in CLAUDE.md: always use the encoded Id, never the database FK field.

// ✅ Correct
const id = entity.Id // "SC1b2Cd"
navigate(`/schemes/${entity.Id}`)
// ❌ Wrong — internal DB FK, not part of the API contract
const id = entity.AddressId // 42
navigate(`/addresses/${entity.AddressId}`)

The reason AddressId even appears on the frontend type at all is that TypeScript types are generated from EF Core entity shapes, which include all columns. The rule is enforced by code review rather than type system; the lint rule no-database-fk could be added in future.

Patch diffs include the rewrite machinery:

// Frontend calculates the diff as nested object paths:
buildPatch(original, updated) →
[{ op: "replace", path: "/Address/Id", value: "AD07Bzq" }]
// Backend's PatchRewriter transforms before applying to EF:
[{ op: "replace", path: "/AddressId", value: 42 }]

The frontend never sees the FK integer; the backend never sees the encoded string once it’s past the write controller.

  1. Don’t log DbId. Logs are sometimes shared across environments. Emitting DbId leaks the enumeration you worked to hide. Log Id instead (a one-line rule that several commits fix).

  2. Don’t URL-encode the encoded ID. Because the base64-URL variant produces only A–Z, a–z, 0–9, -, _, the encoded ID is already URL-safe. A decodeURIComponent round-trip is a no-op in practice, but some helpers still encode defensively — harmless but unnecessary.

  3. Tests need decoded integer IDs for .FindAsync(id) calls. EF’s .FindAsync(key) wants the real PK. Test setup should use the integer seeded into the DB, not the encoded form. Write controllers decode automatically; test helpers may not.

  4. Renaming an entity class breaks every existing URL for that type. The zero-point is derived from the type name. If Scheme becomes Development tomorrow, every scheme’s encoded ID changes because the hash input changed. This is acceptable trade-off for the stability it gives in every other case, but plan the migration — redirect the old URLs, or preserve the encoding function by keeping the old class name as a string override.

  5. Don’t expect the encoded ID to be collision-free across all possible integer PKs. The output is 4 bytes of entropy. Two different DbId values never collide within a single entity type (XOR is a bijection), but this is a cryptographically weak obfuscation — it’s not a hash.

  6. Don’t use Id as a dictionary key in hot loops. It’s a property getter that recomputes SHA-256 of the type name every call. For tight loops, snapshot the integer PK or the encoded string once. The API-boundary cost is negligible; an inner loop might notice.

  7. DbId is abstract and must be overridden in every concrete entity. Missing it produces a runtime error on first access, not a compile error (because DbId is declared on BaseEntity and not on the derived class’s shape). EF Core’s model configuration catches most of these during migrations.