Validating Your Data Model
Inconsistent, incorrect, or missing metadata can lead to problems when processing data or creating reports. Running the model scan validates your data model and alerts you to any issues. After creating a data model, a Space Administrator can run a model scan from the command line using the validatemetadata command.
Example metadata validation rules include:
Hierarchy names should be unique and level names should be unique within the hierarchy.
The expression for a Custom Attribute or Custom Measure cannot exceed 10 nested IF/ELSE statements as this creates more than 10 levels of nested case statements which is not supported by SQL Server.
To validate your data model against all rules, use the following syntax: validatemetadata all
The model validation scans the repository against a set of validation rules and provides a detailed report that describes any issues that are detected. You can run the scan multiple times against your data model. You can select the areas to check such as data sources, hierarchies, aggregates, and custom attributes or select specific rules to validate against your data model.
Rule Groups
validatemetadata rulegroup ruleGroupName
validatemetadata rulegroup "Data Sources"
- Aggregates
- Custom Attributes
- Custom Measures
- Data
Sources
- This validates the settings for data sources on the Manage Sources page.
- Data
Sources - Data Flow
- This validates settings for data sources that are specific to Data Flow, such as Exclude From Warehouse Model.
- Hierarchies
Rules
validatemetadata rule ruleName
validatemetadata rule IncrementalSnapshotFact
The following tables detail the validation rules sorted by Rule Group:
Rule Name |
Rule |
Description |
---|---|---|
Data Sources | ||
IncrementalSnapshotFact | Incremental snapshot fact | Delete key(s) should be marked when “Bulk Insert and Delete Measure Records” is enabled. And marked delete key should be present as either level key or measure on that stagingtable. |
LevelKeyDataTypeLengthCheckRule | Level keys exist and corresponding staging column should be marked as natural key | Level Key(s) of level as well as parent level(s) up to hierarchy should be marked as Natural key on StagingTable at level. |
LevelKeyNaturalKeyFlagChecksOnStagingTable | Level key column is present against a hierarchy level with valid Birst data type and length | Level key data type and length (in case of varchar) on corresponding staging table and on dimension table should be same. It is required because joins apply on that column. |
StagingColumnSourceColumnMappingChecks | SourceFileColumn referred in StagingColumn should exist in SourceFile. | SourceFileColumn referred in StagingColumn should exist in corresponding SourceFile. |
MeasureGrainCreationRule | Measure Grain isn't created as a result of internal naming collision issue | More than one source should not be at the same levelName(s) in different hierarchies. If two sources are on the same levelName(s) then only one fact table gets created instead of two due to name collision. |
LevelKeyColSourceColMapping | Level Keys are not removed from Repository after the column is removed from the Source |
All the level keys associated with staging table should be existed in corresponding source file associated with staging table. If the above condition is not met then list out those columns. |
ValidateETLScript | ValidateETLScript | Session variable is restricted from being used in ETL script. This rule will list down all the scripted sources for which session variable is being used in ETL script. |
ValidateDuplicateMeasureNameDatatype | Validate column (marked as measure) having same name but have different data types | Birst tries to create base facts from all the sources during building application. If the same name column is present under different sources with different data types, and marked as a measure (e.g. Integer and DateTime), Birst will store the fact as Varchar. So as best practice, measures with same name should be of same data type across all the sources, or else, measure name needs to be different. This rule will list out those entire same name columns with different data types and marked as a measure. |
HierarchyDimensionKeyLevel | Hierarchy Dimension Key Level | Dimension key level is the lowest level in the hierarchy |
HigherLevelLevelKeyNaturalKeyFlagChecksOnStagingTable | Day ID, if exists then it should be marked as natural key | Day ID, if exists then it should be marked as natural key |
ValidateDayIDColumn | SourceFileColumn field should not exist on "Day ID" staging colun, if transformation is set to V{LoadDate} | SourceFileColumn field should not exist on "Day ID" staging colun, if transformation is set to V{LoadDate} |
ColumnNameValidationForSourcesHavingSameGrain | Column_Name Validation For Sources Having Same Grain | Two data sources have the same grain, and have columns whose names differ only by underscores being used in one column name whereas spaces are used in the other column name shouldn't be allowed. |
StagingColValidationHavingSameNameAndMarkedOnSameHierarchy | StagingColumn validation having same name and marked on same hierarchy | Data type and width of source column having same name and marked on same hierarchy should be the same |
StagingColLevelKeyValidationHavingSameNameAndMarkedOnSameHierarchy | StagingColumn level key validation having same name and marked on same hierarchy | Data type and width of source column level key having same name and marked on same hierarchy should be the same |
OutOfRangeKeyIndicesOnSourceFile | Key Indices Validation on Source File | Key Indices mapped on a source file should be less than the number of columns in the source file |
Data Source - Data Flow | ||
SourceFieldsValidation | Validate source against ExcludFromModel and Disabled fields | List out sources which are excluded from model and disabled for loading |
Aggregates | ||
QueryAggregateNavigationRule | Query aggregate navigation | Birst should be able to navigate the query used in aggregate generation. If not then those aggregate will be listed out by this rule. |
Hierarchies | ||
CardinalityCheck | Level cardinality is non-zero positive integer | Cardinality set on each level should be non-zero positive integer |
HierarchyLevelRule | Hierarchy name is unique and hierarchy level name is unique within hierarchy | Hierarchy name should be unique and Level name should be unique within hierarchy. |
UniqueSurrogateKeyCheck | Availability of unique surrogate key column name | Surrogate key column name should be unique within hierarchy. |
Custom Attributes | ||
CustomAttributeRule | Custom Attribute derived from other custom attributes fails on improperly constructed CASE statement in SQL | Custom Attribute cannot be used to derive other custom attribute at the same level. |
CustomAttributeExprValidationRule | Custom Attribute should validate and warn me if I exceed the 10 nested IF statement restriction. | Expression of Custom Attribute should not be having more than 10 else if statement. It causes more than 10 level of nested case statement which is not supported by SQL server. |
Custom Measures | ||
CustomMeasureExprValidationRule | Custom Measure should validate and warn me if I exceed the 10 nested IF statement restriction. | Expression of Custom Measure should not be having more than 10 else if statement. It causes more than 10 level of nested case statement which is not supported by SQL server. |
Model Performance | ||
PerformanceWarnings | Warning rule for model performance, triggering a warning if the data model design is impacting performance. Performance checks are designed to be run against Enterprise spaces. Data Model Properties are not accessible for Professional spaces. | Provides warning messages when a model properties setting may affect performance in the space. The rule throws an error warning message if one of these criteria are met: All Time Series are enabled in a space by default, including both Default Time Series and Extended Time Series The Analyze By Date is enabled for all columns. The rule does not check the properties of any prepared pipeline sources, ignored sources, hidden sources, or Live Access sources in the space. |