Need

When the Enricher Cache feature is enabled in Semarchy xDM, the system generates a hash key based on the enricher input values. This hash key uniquely identifies a given set of inputs and allows the system to reuse previously computed results instead of calling the external API again.


Customers often ask:

  • How exactly is the hash key generated?
  • Does the order of input fields influence the hash?
  • Can we reproduce the hash in the database (e.g., using STANDARD_HASH(..., 'SHA1') in Oracle)?
  • Can the hash be decoded?
  • Is additional text or metadata included in the hash generation?


Understanding how the hash key is computed is important when:
  • Investigating cache behavior.
  • Comparing input vs. output hash values.
  • Attempting to preload or externally compute cache entries.
  • Debugging unexpected cache misses.


Summarized Solution

The enricher cache hash key is generated as follows:

  1. A JSON object is built using the enricher inputs:
    • Property names = input names.
    • Property values = input values.
    • Dates are serialized using ISO 8601 format
  2. The JSON object is serialized into a string:
    • Input names are sorted in ascending order.
    • This guarantees consistent ordering.
  3. A SHA-1 hash is computed on the resulting JSON string.
  4. The hash is converted to a hexadecimal string.


Important points:

  • The order of input fields does not influence the hash (because properties are sorted).
  • The algorithm is implemented in Java and is not easily reproducible in database SQL.
  • The hash cannot be decoded (SHA-1 is one-way).
  • No additional hidden text key is added to the input hash.
  • Identical input values must produce identical hash keys.


Detailed Solution


1. JSON Object Construction

When an enricher is executed with caching enabled, xDM first constructs a JSON object representing all input parameters.


Structure:

  • JSON property name > Enricher input name.
  • JSON property value > Enricher input value.


Special handling:

  • Dates are serialized in ISO 8601 format, for example: 2022-03-08T16:52:39


Example JSON object before serialization:

{
  "dateInput": "2022-03-08T16:52:39",
  "numericInput": 55,
  "stringInput1": "valueInput1",
  "stringInput2": "valueInput2"
}

2. Deterministic JSON Serialization

To ensure consistent hashing:

  • Input names (JSON properties) are sorted in ascending alphabetical order.
  • The JSON object is serialized into a string.


Resulting JSON string example:

{"dateInput":"2022-03-08T16:52:39","numericInput":55,"stringInput1":"valueInput1","stringInput2":"valueInput2"}

Because of the sorting step:

  • The order in which inputs are defined in the model does not affect the hash.
  • The same input names and values will always produce the same JSON string.


This guarantees deterministic hash generation.


3. SHA-1 Hash Computation

A SHA-1 hash is calculated from the serialized JSON string.


Process:

  • Take the JSON string.
  • Compute SHA-1 digest.
  • Convert the binary hash to a hexadecimal string.


This hexadecimal string is stored as the enricher cache key.


4. Can the Hash Be Decoded?

NO, SHA-1 is a one-way cryptographic hash function.

It is not reversible.

It is therefore impossible to "decode" the hash to retrieve the original inputs.

If traceability is required, the original input values must be logged or stored separately.


5. Does Field Order Influence the Hash?

NO, because:

  • Input names are sorted before serialization.
  • The JSON string is deterministic.


Changing the order of fields in the enricher definition does not change the hash, provided the input names and values are strictly identical.


If two executions produce different hashes:

  • Either input values differ (including whitespace, case, null vs empty, etc.),
  • Or there may be an unexpected behavior that should be investigated.


If strictly identical inputs produce different hashes, this would indicate a defect.


6. Can the Hash Be Reproduced in the Database?

Not reliably.

Although SHA-1 exists in databases such as Oracle (STANDARD_HASH(..., 'SHA1')), reproducing the exact same hash externally is difficult because:

  • The algorithm is implemented in Java.
  • The exact JSON serialization format must match perfectly.
  • Sorting rules and formatting must be identical.
  • Date formatting must match ISO 8601 exactly.
  • Null handling must match Java’s behavior.


Even small differences (spacing, encoding, quoting) will produce a different hash.


Therefore:

  • Computing the same cache key in pure SQL is not recommended.
  • Preloading or externally generating cache keys is not officially supported.


7. Are Additional Text Keys or Metadata Added?

NO, the hash is computed strictly from the JSON string representing the enricher inputs.

There is no additional hidden prefix, suffix, or text key added to the input hash generation.


If multiple cached records exist for what appears to be the same input:

  • The input values may not be strictly identical.
  • There may be subtle differences (e.g., trailing spaces, null vs empty string, numeric formatting).
  • Further investigation would be required.