Skip to content

Background Jobs

Formation runs five Container App Jobs alongside the long-lived API and web apps. Each has its own Dockerfile, its own managed identity, and its own trigger model (event / schedule / manual). They share the same SQL database and Key Vault as the API, but none of them accept HTTP traffic — jobs are scheduled or enqueued, never called directly.

This page documents each job: what it does, what triggers it, where its source lives, and the external dependencies it owns. For the topology / scaling / identity detail, see Deployment Topology → Jobs.

JobContainer AppSourceTriggerTypical frequency
Data Loadca-jobloadload/Azure Storage Queue (KEDA scaler)Event-driven — one execution per import file
Completeness Scoreca-jobcompscorecompletionscore/Manual / scheduledNightly
Query View Rebuildca-jobqueryvwsrebuildqueryviews/Manual / scheduledOn demand after schema / mapper changes
Currency Importca-jobcurimpcurrencyimport/Scheduled (daily)Daily
Duplicate Detectionca-jobdedupduplicatedetection/ManualWeekly-ish

All jobs share the pattern described in Shared Patterns below: IHostedService worker, IJobProgressService for tracking, env-var-driven trigger metadata, graceful cancellation.

Purpose. Ingest bulk data from files dropped into the data-load blob container. A message on the data-load storage queue points the job at a file; the worker parses it, dispatches the appropriate LiteBus commands, and marks the job complete.

Trigger. KEDA Azure Storage Queue scaler. When a message arrives on the queue, a replica spins up, processes it, then scales back to zero.

Flow.

[blob drop] → [queue message] → jobload replica starts
parse file (CSV / XLSX)
dispatch CreateXxx / UpdateXxx commands via LiteBus
write JobExecution progress rows
acknowledge / delete queue message

Source and deployment. src/services/job/load/. Dockerised, published via dotnet-service-deploy.yml, deployed to ca-jobload Container App Job. Scales to zero between messages.

External dependencies. Azure Storage (queue + blob), SQL (writes through the same API command pipeline), Key Vault (SQL connection string).

Notes. Because load commands go through LiteBus, every write triggers the same event fan-out as an HTTP write — including query-view upserts. A bulk import of 10,000 scheme rows generates ~70,000 query-view upserts. For very large imports, invoke query view rebuild after the load so the view is rebuilt once in batch instead of row-at-a-time.

Purpose. Compute a per-entity “completeness” score that measures how fully populated each row is across its expected fields. Writes CompletenessScore directly to the [query].*List tables (bypassing the normal event-handler path for performance).

Trigger. Manual (via the API’s Container Apps Jobs Operator role) or scheduled — typically a nightly run.

Flow.

scan [app].{Address,Company,Scheme,...} → for each entity:
read required/optional field set from metadata
compute score (weighted, per-entity-type algorithm)
update [query].{Address,Company,Scheme}List.CompletenessScore

Source and deployment. src/services/job/completionscore/. Deployed to ca-jobcompscore. 4-hour replica timeout.

External dependencies. SQL (read + write), Key Vault (connection string), managed identity mi-jobcompscore-01.

Notes. The scores are not recomputed on every entity write — that would make every save slower. Instead, the nightly job recomputes the whole set. Fresh scores for newly-created rows show up the next morning; bulk imports may want to kick the job off explicitly rather than wait for the schedule.

The score calculation logic is documented alongside the job source; see CompletenessCalculator for the per-entity rules.

Purpose. Rebuild [query].*List tables from their [app].* sources. Covers every denormalised table: AddressList, CompanyList, SchemeList, InvestmentEventList, OccupierEventList, PortfolioList.

Trigger. Manual or scheduled (irregular — typically after deployments that change the denormalisation shape, bulk imports, or recovery from event-handler failures).

Flow.

for each Query View Service:
truncate the [query] table
page through the [app] source in batches (default 1000 rows)
load with relations via Mapper.IncludeRelations
project to DTO via Mapper.MapToListItem
populate aggregate counts with one GROUP BY per aggregate
bulk insert into [query]

Source and deployment. src/services/job/rebuildqueryviews/. Deployed to ca-jobqueryvws. 4-hour replica timeout.

External dependencies. SQL only. Uses FromSqlRaw("SELECT ... WITH (NOLOCK)") on the read path to avoid blocking ongoing writes — query-view consistency is eventual, so a near-current snapshot is acceptable during rebuild.

Notes. Deeply covered in Query Views, which explains the mapper ↔ service layering, the column list ↔ FTI ↔ mapper four-site edit requirement, and when you’d actually run this vs letting event handlers maintain the rows incrementally. A full production rebuild completes in 20–40 minutes; per-entity rebuilds (e.g. only CompanyList) scale down accordingly.

Purpose. Pull exchange-rate data from the BI lakehouse (ECB-sourced currency conversion tables) and load it into Formation’s CurrencyConversion table so the API can report property values in EUR / GBP / USD regardless of the currency they were originally captured in.

Trigger. Scheduled (daily) or manual.

Source of truth. The lakehouse — a separate Microsoft SQL-accessible store — exposes a [ECBExchangeRates].[CurrencyConversion] view with raw ECB feed data. The connection string lives in Key Vault as warehouse-db-connection-string and is consumed via ConnectionStrings__WarehouseDb.

Flow.

query lakehouse:
SELECT CurrencyCode, ConversionCurrencyCode, ConversionRate,
EffectiveDate, CurrencyName, CountryCode, CountryName
FROM [ECBExchangeRates].[CurrencyConversion]
pivot:
ECB publishes one row per (currency pair, date) — all against EUR
Group by (CurrencyCode, EffectiveDate)
Prefer direct X→GBP / X→USD pairs where the lakehouse has them
Otherwise compute cross-rates via EUR:
X→GBP = (X→EUR) / (GBP→EUR)
X→USD = (X→EUR) / (USD→EUR)
write Formation:
upsert into [app].[CurrencyConversion] in 500-row batches

Source and deployment. src/services/job/currencyimport/. SqlWarehouseDataReader.cs owns the lakehouse read + pivot; the CurrencyImportWorker drives the overall orchestration. Deployed to ca-jobcurimp.

External dependencies. Lakehouse (read-only), Formation SQL (write), Key Vault (holds both connection strings).

Notes.

  • The lakehouse is nullable at deploy time — warehouseDbConnectionString in Bicep is annotated @secure() and only written to Key Vault on first deploy or explicit rotation. This prevents routine redeploys from overwriting a stable external-system credential with an empty default. If the parameter is absent, the job fails fast on startup with a clear “WarehouseDb connection string is not configured” error rather than silently no-op’ing.
  • Only rows with a non-null ConversionRate are considered; ECB occasionally publishes empty rows for currencies that have no quote on a given date.
  • The pivot logic tolerates either direct-pair or cross-rate data. If the lakehouse changes format (e.g. adds a direct GBP pair for a currency that was previously cross-rated), the output is unchanged.

Purpose. Identify likely-duplicate entities (two address rows that are the same physical address, two company rows that are the same company, two schemes at the same address) and flag them for human review.

Trigger. Manual. Scheduling is possible but currently operationally-triggered when the backlog of potential duplicates is large enough to warrant a pass.

Flow.

for each entity type in (Address, Company, Scheme):
run the type's IDuplicateDetectionStrategy
score candidate pairs by similarity
write flagged pairs to [app].[DuplicateCandidate]

Strategies.

  • AddressDuplicateStrategy — normalised-address matching with tolerance for punctuation, casing, and postcode spacing variations.
  • CompanyDuplicateStrategy — company-number-primary matching with name-fallback for entities without a company number; tolerates legal-suffix variants (Ltd / Limited, Plc / PLC).
  • SchemeDuplicateStrategy — same address + overlapping company / developer set; tolerates name variations.

Source and deployment. src/services/job/duplicatedetection/. Each strategy implements IDuplicateDetectionStrategy and is listed in the worker’s Strategies array. Deployed to ca-jobdedup.

External dependencies. SQL only.

Notes. Duplicate flags are advisory — the worker writes to a separate table rather than hard-merging rows. A human-driven review UI (not yet built) is planned to consume these flags and drive merge operations through the normal LiteBus command pipeline.

Adding a new duplicate strategy is a two-file change: implement IDuplicateDetectionStrategy in src/common/services/DuplicateDetection/ (so the logic is usable from unit tests and from the worker) and add an instance to the Strategies array in DuplicateDetectionWorker.

All five jobs follow the same skeleton so operational tooling treats them uniformly:

  • IHostedService worker — single StartAsync that runs to completion, then calls _hostApplicationLifetime.StopApplication(). No HTTP listener.
  • Progress tracking via IJobProgressService — every job writes rows to [app].JobExecution with JobTypeId (4 = CurrencyImport, 6 = DuplicateDetection, etc.), trigger source, container-execution name, and success/failure state. The API exposes /JobExecutions so operators can see history and re-trigger via the admin UI.
  • Env-var trigger metadataTRIGGER_TYPE (Manual / Scheduled / Event), JOB_EXECUTION_ID (for re-attach when the UI kicked the job off and wants its row pre-created), CONTAINER_APP_JOB_EXECUTION_NAME (Azure-assigned). Defaults cover the manual-invocation case.
  • Graceful cancellationOperationCanceledException is caught cleanly; the progress row is marked cancelled rather than errored. Jobs stopped mid-flight don’t poison the progress table.
  • Key Vault-backed connection strings — nothing is hardcoded; each job’s managed identity has Key Vault Secrets User on the environment’s vault.
  • 4-hour replicaTimeout — long enough for the largest historical run; shorter would cause a full rebuild to truncate.

Adding a new job means following the same shape: new folder under src/services/job/, new Bicep entry using aca_job.bicep, new managed identity + RBAC, new JobTypeId, new IJobProgressService registration. See the existing jobs as templates.