Train your model
Out of the box, Buster will have a base-line performance. This baseline performance varies based on the complexity of your schema. To ensure production-ready performance, you will need to provide Buster with examples of correct question-SQL pairs.
Within a data source’s “Monitor & Fine-tune” tab, you can manage, create, and approve question-SQL pairs.
1. Ask a data question
If you’re using Buster for the first time, you can ask a data question by clicking the “New question” button. When you ask a question, Buster will generate SQL in 5-10 seconds. You can also click into previously asked questions from this tab’s table view.
2. Validate the SQL
Ensure that the generated SQL is correct and returns the correct data. If everything looks good, move to step 4. If the SQL is incorrect, move to step 3.
3. Fix the SQL
If the generated SQl is incorrect, you are able to edit it SQL directly in the code editor. You can then re-run the query to validate that it returns the correct data.
You are also able to edit the SQL by guiding Buster via the “Not quite right?” text box. This enables you to explain what is incorrect using natural language. When you send your message, Buster will attempt to fix the previously generated SQL.
4. Add it to your model
Once you click the ‘Add to model’ button, Buster immediately starts learning from the question-SQL pair. You can always come back and edit the summary question, SQL, or even remove the question-SQL pair from the model. The more pairs that you add to your model, the more performant your model will be.
Once you’ve added a question-SQL pair to your model it will be marked as ‘Added’ in the table view (found in your data source’s ‘Monitor & Fine-tune’ tab). If a question isn’t added to your model, it will be marked as “Not added”.
After fine-tuning your model, you may still encounter performance issues. This is common when when your database has lots of ‘gotcha’ moments (confusing schema names, calculations, etc that a new analyst would likely encounter).
A quick win that we see with many of our customers is simply cutting out unnecessary columns and tables that don’t answer end user questions. This reduces noise and enables Buster to deliver more accurate results.
The “Debug” tab can also be helpful for identifying issues, troubleshooting, and debugging. Within the “Debug” tab you can see which tables, SQL examples, stored values, etc are being used to answer the associated data question.
Below are some common troubleshooting issues that you may encounter.
If you notice that Buster is still making incorrect joins after defining joins in the “Joins & constraints” tab, it helps to add explicit instructions to the table descriptions.
A common situation we see is when a join is theoretically correct, but results in inaccurate data. If this is the case, you should add a note to the table description that explains how the table should/shouldn’t be joined.
Misinterpreting a column
If Buster is misinterpreting a column, you should double check the column description and make sure it aligns with the data in the column.
If the description is correct, but you are still seeing issues, it doesn’t hurt to add more detail to the column description. Some examples of where that is necessary:
- If the column is an enum that has less than 10 values, you should add the possible values to the description. If they are not text type enums and there is some sort of value mapping, you should add that as well.
- If the column is a JSON or nested type, you should add the schema of the JSON to the description.
- If the column is a date, you could add the format of the date to the description.
- If the column is a numeric type, you could add the range of the values to the description.
- If any nuanced terminology exists around a column, you should include that as well.
Misinterpreting a value
End users don’t always refer to text values in your database exactly as their stored. For example, if you had the value
rd_bll_12oz in your database and users ask for that value using “red bull”, Buster would generate a query that looks like:
SELECT * FROM sales WHERE product_name = 'red bull'
If you enable Buster to store and access a column’s unique text values, Buster will be able to search through the values and find the most relevant match/pattern for the user’s question. Resulting in a query that is correct:
SELECT * FROM sales WHERE product_name = 'rd_bll_12oz'
To enable this feature, you can toggle on the column in the “Stored Values” section of your data source: Select your data source -> Stored Values -> Toggle column on
Misinterpreting the entire schema
If you have completed all the onboarding and troubleshooting steps and are still seeing performance issues, you should simplify your schema by modeling tables into more digestible views. This is rarely the case. Typically, we only see this as a solution when your schema has so many ‘gotchas’ that you could never possibly define them all.
If you are considering this as a solution, please reach out to us at: firstname.lastname@example.org