Background Jobs
Formation runs five Container App Jobs alongside the long-lived API and web apps. Each has its own Dockerfile, its own managed identity, and its own trigger model (event / schedule / manual). They share the same SQL database and Key Vault as the API, but none of them accept HTTP traffic — jobs are scheduled or enqueued, never called directly.
This page documents each job: what it does, what triggers it, where its source lives, and the external dependencies it owns. For the topology / scaling / identity detail, see Deployment Topology → Jobs.
Table of Contents
Section titled “Table of Contents”- At a Glance
- Data Load —
jobload - Completeness Score —
jobcompscore - Query View Rebuild —
jobqueryvws - Currency Import —
jobcurimp - Duplicate Detection —
jobdedup - Shared Patterns
At a Glance
Section titled “At a Glance”| Job | Container App | Source | Trigger | Typical frequency |
|---|---|---|---|---|
| Data Load | ca-jobload | load/ | Azure Storage Queue (KEDA scaler) | Event-driven — one execution per import file |
| Completeness Score | ca-jobcompscore | completionscore/ | Manual / scheduled | Nightly |
| Query View Rebuild | ca-jobqueryvws | rebuildqueryviews/ | Manual / scheduled | On demand after schema / mapper changes |
| Currency Import | ca-jobcurimp | currencyimport/ | Scheduled (daily) | Daily |
| Duplicate Detection | ca-jobdedup | duplicatedetection/ | Manual | Weekly-ish |
All jobs share the pattern described in Shared Patterns below: IHostedService worker, IJobProgressService for tracking, env-var-driven trigger metadata, graceful cancellation.
Data Load — jobload
Section titled “Data Load — jobload”Purpose. Ingest bulk data from files dropped into the data-load blob container. A message on the data-load storage queue points the job at a file; the worker parses it, dispatches the appropriate LiteBus commands, and marks the job complete.
Trigger. KEDA Azure Storage Queue scaler. When a message arrives on the queue, a replica spins up, processes it, then scales back to zero.
Flow.
[blob drop] → [queue message] → jobload replica starts │ ▼ parse file (CSV / XLSX) │ ▼ dispatch CreateXxx / UpdateXxx commands via LiteBus │ ▼ write JobExecution progress rows │ ▼ acknowledge / delete queue messageSource and deployment. src/services/job/load/. Dockerised, published via dotnet-service-deploy.yml, deployed to ca-jobload Container App Job. Scales to zero between messages.
External dependencies. Azure Storage (queue + blob), SQL (writes through the same API command pipeline), Key Vault (SQL connection string).
Notes. Because load commands go through LiteBus, every write triggers the same event fan-out as an HTTP write — including query-view upserts. A bulk import of 10,000 scheme rows generates ~70,000 query-view upserts. For very large imports, invoke query view rebuild after the load so the view is rebuilt once in batch instead of row-at-a-time.
Completeness Score — jobcompscore
Section titled “Completeness Score — jobcompscore”Purpose. Compute a per-entity “completeness” score that measures how fully populated each row is across its expected fields. Writes CompletenessScore directly to the [query].*List tables (bypassing the normal event-handler path for performance).
Trigger. Manual (via the API’s Container Apps Jobs Operator role) or scheduled — typically a nightly run.
Flow.
scan [app].{Address,Company,Scheme,...} → for each entity: read required/optional field set from metadata compute score (weighted, per-entity-type algorithm) update [query].{Address,Company,Scheme}List.CompletenessScoreSource and deployment. src/services/job/completionscore/. Deployed to ca-jobcompscore. 4-hour replica timeout.
External dependencies. SQL (read + write), Key Vault (connection string), managed identity mi-jobcompscore-01.
Notes. The scores are not recomputed on every entity write — that would make every save slower. Instead, the nightly job recomputes the whole set. Fresh scores for newly-created rows show up the next morning; bulk imports may want to kick the job off explicitly rather than wait for the schedule.
The score calculation logic is documented alongside the job source; see CompletenessCalculator for the per-entity rules.
Query View Rebuild — jobqueryvws
Section titled “Query View Rebuild — jobqueryvws”Purpose. Rebuild [query].*List tables from their [app].* sources. Covers every denormalised table: AddressList, CompanyList, SchemeList, InvestmentEventList, OccupierEventList, PortfolioList.
Trigger. Manual or scheduled (irregular — typically after deployments that change the denormalisation shape, bulk imports, or recovery from event-handler failures).
Flow.
for each Query View Service: truncate the [query] table page through the [app] source in batches (default 1000 rows) load with relations via Mapper.IncludeRelations project to DTO via Mapper.MapToListItem populate aggregate counts with one GROUP BY per aggregate bulk insert into [query]Source and deployment. src/services/job/rebuildqueryviews/. Deployed to ca-jobqueryvws. 4-hour replica timeout.
External dependencies. SQL only. Uses FromSqlRaw("SELECT ... WITH (NOLOCK)") on the read path to avoid blocking ongoing writes — query-view consistency is eventual, so a near-current snapshot is acceptable during rebuild.
Notes. Deeply covered in Query Views, which explains the mapper ↔ service layering, the column list ↔ FTI ↔ mapper four-site edit requirement, and when you’d actually run this vs letting event handlers maintain the rows incrementally. A full production rebuild completes in 20–40 minutes; per-entity rebuilds (e.g. only CompanyList) scale down accordingly.
Currency Import — jobcurimp
Section titled “Currency Import — jobcurimp”Purpose. Pull exchange-rate data from the BI lakehouse (ECB-sourced currency conversion tables) and load it into Formation’s CurrencyConversion table so the API can report property values in EUR / GBP / USD regardless of the currency they were originally captured in.
Trigger. Scheduled (daily) or manual.
Source of truth. The lakehouse — a separate Microsoft SQL-accessible store — exposes a [ECBExchangeRates].[CurrencyConversion] view with raw ECB feed data. The connection string lives in Key Vault as warehouse-db-connection-string and is consumed via ConnectionStrings__WarehouseDb.
Flow.
query lakehouse: SELECT CurrencyCode, ConversionCurrencyCode, ConversionRate, EffectiveDate, CurrencyName, CountryCode, CountryName FROM [ECBExchangeRates].[CurrencyConversion]
pivot: ECB publishes one row per (currency pair, date) — all against EUR Group by (CurrencyCode, EffectiveDate) Prefer direct X→GBP / X→USD pairs where the lakehouse has them Otherwise compute cross-rates via EUR: X→GBP = (X→EUR) / (GBP→EUR) X→USD = (X→EUR) / (USD→EUR)
write Formation: upsert into [app].[CurrencyConversion] in 500-row batchesSource and deployment. src/services/job/currencyimport/. SqlWarehouseDataReader.cs owns the lakehouse read + pivot; the CurrencyImportWorker drives the overall orchestration. Deployed to ca-jobcurimp.
External dependencies. Lakehouse (read-only), Formation SQL (write), Key Vault (holds both connection strings).
Notes.
- The lakehouse is nullable at deploy time —
warehouseDbConnectionStringin Bicep is annotated@secure()and only written to Key Vault on first deploy or explicit rotation. This prevents routine redeploys from overwriting a stable external-system credential with an empty default. If the parameter is absent, the job fails fast on startup with a clear “WarehouseDb connection string is not configured” error rather than silently no-op’ing. - Only rows with a non-null
ConversionRateare considered; ECB occasionally publishes empty rows for currencies that have no quote on a given date. - The pivot logic tolerates either direct-pair or cross-rate data. If the lakehouse changes format (e.g. adds a direct GBP pair for a currency that was previously cross-rated), the output is unchanged.
Duplicate Detection — jobdedup
Section titled “Duplicate Detection — jobdedup”Purpose. Identify likely-duplicate entities (two address rows that are the same physical address, two company rows that are the same company, two schemes at the same address) and flag them for human review.
Trigger. Manual. Scheduling is possible but currently operationally-triggered when the backlog of potential duplicates is large enough to warrant a pass.
Flow.
for each entity type in (Address, Company, Scheme): run the type's IDuplicateDetectionStrategy score candidate pairs by similarity write flagged pairs to [app].[DuplicateCandidate]Strategies.
- AddressDuplicateStrategy — normalised-address matching with tolerance for punctuation, casing, and postcode spacing variations.
- CompanyDuplicateStrategy — company-number-primary matching with name-fallback for entities without a company number; tolerates legal-suffix variants (Ltd / Limited, Plc / PLC).
- SchemeDuplicateStrategy — same address + overlapping company / developer set; tolerates name variations.
Source and deployment. src/services/job/duplicatedetection/. Each strategy implements IDuplicateDetectionStrategy and is listed in the worker’s Strategies array. Deployed to ca-jobdedup.
External dependencies. SQL only.
Notes. Duplicate flags are advisory — the worker writes to a separate table rather than hard-merging rows. A human-driven review UI (not yet built) is planned to consume these flags and drive merge operations through the normal LiteBus command pipeline.
Adding a new duplicate strategy is a two-file change: implement IDuplicateDetectionStrategy in src/common/services/DuplicateDetection/ (so the logic is usable from unit tests and from the worker) and add an instance to the Strategies array in DuplicateDetectionWorker.
Shared Patterns
Section titled “Shared Patterns”All five jobs follow the same skeleton so operational tooling treats them uniformly:
IHostedServiceworker — singleStartAsyncthat runs to completion, then calls_hostApplicationLifetime.StopApplication(). No HTTP listener.- Progress tracking via
IJobProgressService— every job writes rows to[app].JobExecutionwithJobTypeId(4 = CurrencyImport, 6 = DuplicateDetection, etc.), trigger source, container-execution name, and success/failure state. The API exposes/JobExecutionsso operators can see history and re-trigger via the admin UI. - Env-var trigger metadata —
TRIGGER_TYPE(Manual/Scheduled/Event),JOB_EXECUTION_ID(for re-attach when the UI kicked the job off and wants its row pre-created),CONTAINER_APP_JOB_EXECUTION_NAME(Azure-assigned). Defaults cover the manual-invocation case. - Graceful cancellation —
OperationCanceledExceptionis caught cleanly; the progress row is marked cancelled rather than errored. Jobs stopped mid-flight don’t poison the progress table. - Key Vault-backed connection strings — nothing is hardcoded; each job’s managed identity has Key Vault Secrets User on the environment’s vault.
- 4-hour
replicaTimeout— long enough for the largest historical run; shorter would cause a full rebuild to truncate.
Adding a new job means following the same shape: new folder under src/services/job/, new Bicep entry using aca_job.bicep, new managed identity + RBAC, new JobTypeId, new IJobProgressService registration. See the existing jobs as templates.
See also
Section titled “See also”- Deployment topology → Jobs — scaling, identity, and trigger configuration
- Query views — what
jobqueryvwsandjobcompscorewrite - CQRS flow — how
jobloaddispatches commands and why its writes fan out via events