Lynchpin View RSS

Independent analytics consultancy, using data science, data strategy and data engineering to launch you higher
Hide details



Automated testing: Developing a data testing tool using Cursor AI 11 Dec 2024 8:17 AM (3 months ago)

In this blog, we discuss the development of an automated testing project, using the AI and automation capabilities of Cursor to scale and enhance the robustness of our data testing services. We walk through project aims, key benefits, and considerations when leveraging automation for analytics testing.


Project Background:

On an ongoing basis, we upgrade a JavaScript library that we manage, to include a number of improvements and enhancements, which is deployed to numerous sites. The library integrates with different third-party web analytics tools and performs a number of data cleaning and manipulation actions. Once we upgrade the library, our main priorities are:

Feature testing: Verify new functionality across different sites/environments

Regression testing: Ensure existing functionality has not been negatively affected across different sites

To achieve this, we conduct a detailed testing review across different pages of the site. This involves performing specific user actions (such as page views, clicks, search, and other more exciting actions) and ensuring that the different events are triggered as expected. We capture network requests for outgoing vendors, such as Adobe Analytics or Google Analytics through the browser’s developer tools or a network debugging tool (e.g., Charles) and verify if the correct events are triggered and relevant parameters are captured accurately in the network requests. By ensuring that all events are tracked with the right data points, we can confirm that both new features and the existing setup are working as expected.

Project Aim:

To optimise this process and reduce the manual effort involved, we developed an automated testing tool designed to streamline and speed up data testing. As an overview, this tool automatically simulates user actions on different sites and different browsers, triggering the associated events, and then checks network requests to ensure that the expected events are fired, and the correct parameters are captured.

Automated Testing Benefits:

In the era of AI, automation is a key driver of efficiency and increased productivity. Automating testing processes offers several key benefits to our development and data testing capabilities, such as:

  • Reduces setup time and creating testing documentation: We’re able to run through different tests and scenarios with a one-time setup for each site and each version.
  • More accurate data testing: With a thought-out test plan which is followed precisely, we’re able to put more trust in our testing outcome. This helps us identify issues quicker.
  • Better test coverage: We can run tests on different browsers and devices, using the same setup.

How We Did It:

We chose Python as the primary scripting language, as it offers flexibility for handling complex tasks. Python’s versatility and extensive libraries made it an ideal choice for rapid development and iteration.

For simulating a variety of user interactions and conducting tests across multiple browsers, we selected Playwright. Playwright is a powerful open-source automation tool/API for browser automation. It supports cross-browser data testing (including Chrome, Safari, Firefox), allowing us to validate network requests across a broad range of environments.

We used the Cursor AI code editor to optimise the development process and quickly set up the tool. Cursor’s proprietary LLM, optimised for coding, enabled us to design and create scripts efficiently, accelerating development by streamlining the debugging and iteration process. Cursor’s AI-assistant (chat sidebar) boosted productivity by providing intelligent code suggestions, speeding up debugging and investigation. We’ll dive into our experience using Cursor a bit further in the next section

Lastly, we chose Flask to build the web interface where users can select different types of automated testing. Flask is a lightweight web framework for Python, which we’ve had experience with for other projects. It has its pros and cons, but a key benefit of this project was that it allowed us to get started quickly and focus more on the nuts and bolts of the program.

Our Experience with Cursor:

Cursor AI played a crucial role in taking this project from ideation to MVP. By carefully prompting Cursor’s in-editor AI assistant, we were able to achieve the results we wanted. The tool allowed us to focus on the core structure of the program and the logic of each test without getting bogged down in documentation and finicky syntax errors.

Cursor also gave us the capability to include specific files, documentation links, and diagrams as context for prompts. This allowed us to provide relevant information for the model to find a solution. Compared to an earlier version of Github’s copilot that we tested, we thought this was a clear benefit in leading the model to the most appropriate outcome.

Another useful benefit of Cursor AI was the automated code completion, which could identify bugs and propose fixes, as well as suggest code to add to the program. This feature was useful when it understood the outcome we were aiming for, which it did more often than not.

However, not everything was plain sailing, and our experience did reveal some drawbacks to using AI code editors to be mindful of. For example, relying too much on automated suggestions can distance yourself from the underlying code, making it harder to debug complex issues independently. It was important to review the suggested code and use Cursor’s helpful in-editor diffs to clearly outline the proposed changes. This also allowed us to accept or reject these changes, giving us a good level of control.

Another drawback we noticed is that AI-generated code may not always follow best practices or be optimised for performance, so it’s crucial to review and validate the output carefully. For example, Cursor tended to create monolithic scripts instead of separating functionality into components, such as tests and Flask-related parts, which would be easier to manage in the long term.

Another point we noticed was that over-reliance on AI tools could easily lead to complacency, potentially affecting our problem-solving skills and creativity as developers. When asking Cursor to make large changes to the codebase, it can be easy to just accept all changes and test if they worked without fully understanding the impact. When developing without AI assistance (like everyone did a couple of years ago), it’s better to make specific and relatively small changes at a time to reduce the risk of introducing breaking changes and to better understand the impact of each change. This seems to be a sensible approach when working with a tool like Cursor.

What We Achieved – Efficiencies Unlocked:

The automated testing tool we developed significantly streamlined and optimised the data testing process in a number of key ways:

  • Accelerated project development: Using Cursor AI, we rapidly moved through development and completed the project in a short period. The AI-driven interface, combined with Playwright’s capabilities, sped up our debugging process—a major challenge in previous R&D projects. In the past, we often faced delays due to debugging blockers, but now, with the AI assistant, we could directly identify and fix issues, completing the project in a fraction of the time.
  • Built a robust, reusable tool: The tool is scalable and flexible, and can be adapted for different analytics platforms (e.g., Google Analytics, Meta, Pinterest). It is reusable across different projects and client needs, as well as different browsers and environments.
  • Time efficiency & boosted productivity: One of the most valuable outcomes was the significant reduction in manual testing time. With the new automated testing tool, we ran multiple test cases simultaneously, speeding up the overall process. This helped us meet tight project deadlines and improve client delivery without sacrificing quality. Additionally, it freed up time for focusing on challenging tasks and optimising existing solutions.


Conclusion:

With AI, the classic engineering view of ‘why spend 1 hour doing something when I can spend 10 hours automating it?’ has now become ‘why spend 1 hour doing something when I can spend 2-3 hours automating it?’. In this instance, Cursor allowed us to lower the barrier for innovation and create a tool to meet a set of tight deadlines, whilst also giving us a feature-filled, reusable program moving forwards.


For more information about how we can support your organisation with data testing – including our automated testing services – please feel free to contact us now or explore the links below

CONTACT USVIEW OUR CAPABILITIESVIEW OUR SERVICES

The post Automated testing: Developing a data testing tool using Cursor AI appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) Applying RegEx filtering in Looker Studio to clean up and standardise GA4 reporting 15 Nov 2024 7:36 AM (4 months ago)

In the latest episode of our ‘Calling Kevin’ video series, we show you how to clean up and filter URLs using a few simple expressions in Looker Studio.

By applying these Regular Expressions (RegEx), you can easily remove duplicates, fix casing issues, and tidy up troublesome URL data to standardise GA4 reporting – just as you would have been able to in Universal Analytics.

Expressions used:

  1. To remove parameters from a page path: REGEXP_EXTRACT(Page, “^([\\w-/\\.]+)\\??”)
  2. To remove trailing slash from a page path: REGEXP_REPLACE(Page, “(/)$”, “”)
  3. To make a page path lowercase: LOWER()
    Combined: LOWER(REGEXP_REPLACE(REGEXP_EXTRACT(Page path + query string, “^([\\w-/\\.]+)\\??”), “(/)$”, “”))

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) Applying RegEx filtering in Looker Studio to clean up and standardise GA4 reporting appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Webinar: Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness 3 Oct 2024 1:54 AM (6 months ago)

How do you know what’s working and not working and plan for success as the tides of digital measurement continue to change?

The themes of privacy, measurement and marketing effectiveness triangulate around a natural trade and tension: balancing the anonymity of our behaviours and preferences against the ability for brands to reach us relevantly and efficiently.

In this briefing our CEO, Andrew Hood, gives you a practical and independent view of current industry trends and how to successfully navigate them.

Want the full-length white paper?


Building on the themes introduced in the webinar, our white paper lays out an in-depth look at the privacy trends, advanced measurement strategies, and balanced approach you can take to optimise marketing effectiveness.

Unlock deep-dive insight and practical tips you can begin implementing today to guide your focus over the coming months.

VIEW NOW

To access a copy of the slides featured in the webinar, click the button below

View presentation slides

The post Webinar: Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Benefits of marketing mix modelling: Why is MMM so popular right now? 5 Sep 2024 5:01 AM (7 months ago)

Introduction

The concept of marketing mix modelling (often referred to as just ‘MMM’) has been around for a while – as early as the 1960s in fact – which should be no surprise, as the business challenge of what marketing channels to use and where best to spend your money has always been the essence of good marketing, at least if somebody is holding you accountable for that spend and performance!

Marketing mix modelling has its foundations in statistical techniques and econometric modelling, which still holds largely true today. However, the mix of channels and advancements in end-to-end analytics create new challenges to be tackled, not least the expectations of what MMM is and what it can deliver.
In reality, there are various analytics techniques that can be undertaken to answer the overall business question: ‘how do my channels actually impact sales?’. In this blog we will answer some common questions about MMM, address some common (comparable) techniques, and share how and when you might look to choose one method over another.

What is marketing mix modelling?

MMM is a statistical technique, with its roots in regression, that aims to analyse the impact of various marketing tactics on sales over time (other KPIs are also available!). Marketing mix modelling will consider all aspects of marketing to do this, such as foundational frameworks like ‘The 4 Ps of Marketing’ (Product, Price, Place and Promotion).

MMM is similar to econometric modelling in terms of techniques used, however there are some key differences. On the whole, econometrics is broader in its considerations and applications, often encompassing aspects of general economic factors in relation to politics, international trade, public policy and more. MMM, on the other hand, focusses more specifically on marketing activities and their impact on business outcomes.

You might also come across the term ‘media mix modelling’ (with the same unhelpful acronym, ‘MMM’). Much like econometrics, media mix modelling tends to differ from marketing mix modelling due to its scope and general objective. Media mix modelling tends to have an even narrower focus than marketing mix modelling; As the name implies, it’s aimed more specifically on optimising a mix of media channels, focussing on optimising advertising spend.

Whether its marketing mix modelling or media mix modelling you are looking at, the key is to consider the business question you are looking to answer and ensure your model is trained using the best input variables to answer that question – Nothing new in the world of a good analytics project!

Why is MMM seeming to gain traction recently?

In recent years, the general trend has been to measure everything, integrate everything, and to link all of your data together, leaving no doubt about who did what, when, and to what end. However increasing concerns (or at least considerations) around data privacy and ethics has caused marketers to take a second look at how they collect and utilise their data.
There is a growing need to adapt to new privacy regulations, but also a greater desire to respect an individual’s privacy and find better ways to understand what marketing activities drive positive or negative outcomes.

With limitations on the ability to track 3rd party cookies, approaches such as marketing attribution may become more difficult to implement, although the effectiveness of these data sources is in itself doubtful. And with consent management becoming increasingly granular, even 1st party measurement can leave gaps in your data collections.
However, the power that marketing attribution gave marketers is well recognised now and the desire to continue to be data-led is only increasing. Machine learning has become a commonplace tool in beginning to fill the gaps that are creeping back in to the tracking of user behaviour. Organisations are also increasingly eager to build on the power of what they have learnt with these joined-up customer journeys, and there is that need again to look across the whole of marketing, not just these digital touchpoints, and replicate that approach in a more holistic way.

So in summary, while marketing mix modelling has never gone away, it is now seeing a revival as an essential tool in a marketer’s toolbelt.

The benefits of MMM: Why should organisations consider using marketing mix modelling services?

MMM is a great tool for any organisation looking to be more data led in their approach to planning and analysis of marketing activities. Key benefits of MMM include:

Ability to measure and optimise the effectiveness of marketing and advertising campaigns:
The purpose of MMM is to measure the impact of your marketing activities on your business outcomes. A well-built marketing mix model will enable you to quantify ROI by channel and make better data-led decisions on the mix of marketing activities that will lead to more optimised campaigns.

Natural adeptness at cross-channel insights:
With increasing limitations on tracking users across multiple channels the methodology for MMM neatly side steps these restrictions by using data at an aggregated level. By its very nature it doesn’t require linking user identities across different devices or tracking individuals using offline channels.

Enables more strategic planning and budgeting:
MMM provides data-driven insight to inform budget planning processes. Its outputs are transparent, allowing organisations to understand the impact each of their channels have on business outcomes and how those channels influence each other within the mix. By incorporating MMM with other tools for scenario planning, spend optimisation and forecasting, organisations can better understand what happened in the past to plan more effectively for the future.

Can be used when granular level data is not available:
As mentioned earlier, MMM works with data at an aggregated level. This offers more flexibility when looking to integrate data inputs into your decision making such as:

  • Linking offline activity with online sales
  • Linking online activity with offline sales
  • Understanding impact of external influences such as macroeconomic factors, seasonality, competitor activities etc

Has a longer-term focus:
MMM is a powerful technique for longer term planning and assessing the impact of campaigns that don’t necessarily provide immediate impact (e.g. brand awareness campaigns, TV, and display advertising etc). By incorporating MMM into a measurement strategy, businesses can ensure longer-term activity is appropriately considered.

Marketing mix modelling vs. marketing attribution modelling: How do they differ? What are the pros and cons?

Earlier in this blog we looked at how marketing mixed modelling compares to econometrics and media mixed modelling. Another very important modelling approach to consider when looking at marketing effectiveness is marketing attribution.
Marketing attribution differs from marketing mix modelling in a number of important ways – most importantly by relying on a more granular approach. It looks to assign weightings to each individual touchpoint on the customer journey, incorporating each user’s journey and determining whether that journey leads to a successful conversion or not.
This very detailed understanding of how each customer interacts with your channels can be very powerful, but also very complex and time consuming to both collect and analyse; In addition with the increasing limitations on tracking individuals without their consent, you may end up having to rely on only a partial picture of the user journey.
While of course it is possible to model on a subset of data, you would need to be careful that the user journey you are looking to understand is not unfairly weighted to those channels (or individuals) that are easier to track.
Marketing attribution also uses a wider range of modelling algorithms, from the simple (linear, time-decay) to the more complex (Markov Chains, Game Theory, ML models). This range of models to select from can be both a benefit and a hindrance, with difficulties arising when you’re not sure what marketing attribution model will suit your business needs best.

Marketing mix modelling does have its own drawbacks to consider too. The biggest consideration when determining if MMM is suitable for you is to understand how much historical data you have.
While a marketing attribution model can work on just a few months of data, so long as it has decent volume and is fairly representative of your typical user journeys, MMM relies on trends over longer periods of time – typically a minimum of 2 years’ worth of data is advised before undertaking an MMM project. MMM also works best when looking at the broader impacts marketing has on your goals. Therefore, if you need to analyse specific campaign performance or delve deeper into specific channels, then marketing attribution will be the better bet.

Can MMM and marketing attribution complement each other?

In a previous blog, we discussed the merits of using both marketing attribution and MMM side by side to provide a more powerful and comprehensive understanding of marketing effectiveness.
While a marketing attribution model will focus on individual touchpoints and their contributions, MMM will take a holistic view, considering the overall impact of marketing inputs. By combining these two approaches, marketers can gain a more complete picture of how different marketing elements work together to drive business outcomes and the demystify the balance they needed across marketing activity for maximum business performance.

Summary

Marketing mix modelling is very a powerful and well-established statistical technique. Most marketers should be at least exploring the benefits and insight it provides into the relationship between marketing activity and business performance to optimise planning and decision making.

Some barriers to entry in starting an MMM project can be navigating what may appear to be a complex set of approaches and techniques. While variations of MMM do exist – econometrics, marketing mix modelling, and media mix modelling – the key difference lies in the scope and objective of the business question you aim to answer. Successfully choosing and developing a model depends on fully understanding your business needs and the data available to you. Investing time upfront to determine what you are looking to achieve is essential in getting the right outcomes.

MMM is best used for strategic planning and determining longer term impacts of your marketing activities. Therefore, if you require more in-depth campaign and channel analysis, then marketing attribution may be more suitable for your business needs. However, it’s important to note that MMM and marketing attribution can work side by side to develop a more complete picture of your marketing activities. While MMM allows greater flexibility when working with a mix of channels that are both tracked and not tracked, the ability of marketing attribution to provide a more granular analysis of your marketing journeys, channels, and campaigns allows for day-to-day optimisation of your marketing activities alongside the longer-term strategy set out by your MMM insights.


If you are ready to explore MMM, marketing attribution, or anything in between, we’d be delighted to discuss your needs in more detail.

For more information on boosting marketing effectiveness and how our team can assist with topics raised in this blog, please visit the links below

Register for our Webinar: ‘Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness’ on 25 Sept 2024VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Benefits of marketing mix modelling: Why is MMM so popular right now? appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Google (finally) supports Custom Event Data Import in GA4 17 Jul 2024 3:06 AM (8 months ago)

Decorative graphic to illustrate Custom Event Data Import in GA4

Google has recently updated their GA4 ‘Data Import’ feature to finally support Custom Event metadata. This is a significant development, but before we dive in, let’s remind ourselves of a key point: Despite its name, which can give false hope after an outage, ‘Data Import’ is NOT a solution for repopulating lost data. It is however a powerful tool for augmenting existing data with information that isn’t directly collected in GA4. Common sources that we find our clients wanting to integrate include CRM systems, offline sales data, or other third-party analytics tools.

When would Custom Event Data Import be useful?

Well, there are many cases:

The information we import might not be available until after collection. This could include data that is processed or generated by third-party tools after the event has already occurred. A prime example would be cost data for non-Google ad clicks and impressions.

Some information might not be something we want exposed on our site. Importing such data ensures it remains secure and is only used for internal analysis. These might include things like a product’s wholesale price, or a user’s lifetime customer value.

Information collected offline, such as in-store purchases or interactions, could be integrated with your existing GA4 data to allow for a more complete view of customer behaviour across both online and offline touchpoints.

Although Data Import supported Cost, Product, and User-scoped data, what was conspicuously absent up until now was the ability to import data directly scoped to existing Custom Events. This is particularly significant because, as Google likes to remind us, GA4 is ultimately event-based.

To understand if this development could be useful for you, consider the events you already track. Is there any information directly related to these events and their custom dimensions that you don’t collect in GA4, but have available offline or in another tool? If so, Custom Event data import could be very handy.

It’s been a long and somewhat painful journey with GA4, but it’s great to see it gradually becoming feature complete.

Of course, if you’re looking to augment your GA4 data with information available at the point of collection, Lynchpin would recommend harnessing the power of a server-side GTM implementation to augment your GA4 data before it even arrives to GA4 itself.

For more information on server-side GTM and its advantages we highly recommend reading the blogs below:


To discuss any of the topics mentioned in this blog or to find out how Lynchpin can support you with any other data and analytics query, please do not hesitate to reach out to a member of our team.

For more information about what we do

VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Google (finally) supports Custom Event Data Import in GA4 appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Working with dbt & BigQuery: Some issues we encountered and their solutions 2 Jul 2024 12:51 AM (9 months ago)

Here at Lynchpin, we’ve found dbt to be an excellent tool for the transformation layer of our data pipelines. We’ve used both dbt Cloud and dbt Core, mostly on the BigQuery and Postgres adapters.

We’ve found developing dbt data pipelines to be a really clean experience, allowing you to get rid of a lot of boilerplate or repetitive code (which is so often the case writing SQL pipelines!).

It also comes with some really nice bonuses like automatic documentation and testing along with fantastic integrations with tooling like SQLFluff and the VSCode dbt Power User extension.

As with everything, as we’ve used the tool more, we have found a few counter-intuitive quirks that left us scratching our heads a little bit, so we’d thought we share our experiences!

All of these quirks have workarounds, so we’ll share our thoughts plus the workarounds that we use.


Summary:

  1. Incremental loads don’t work nicely with wildcard table queries in BigQuery
  2. The sql_header() macro is the only way to do lots of essential things and isn’t really fit for purpose.
  3. Configuring dev and prod environments can be a bit of a pain


1. Incremental loads with wildcard tables in BigQuery

Incremental loads in dbt are a really useful feature that allows you to cut down on the amount of source data a model needs to process. At the cost of some extra complexity, they can vastly reduce query size and the cost of the pipeline run.

For those who haven’t used it, this is controlled through the is_incremental() macro, meaning you can write super efficient models like this.

SELECT *
FROM my_date_partitioned_table
{% if is_incremental() %}
WHERE date_column > (SELECT MAX(date_column) FROM {{ this }}
{% endif %}

This statement is looking at the underlying model and finding the most recent data based on date_column. It then only queries the source data for data after this. If the table my_date_partitioned_table is partitioned on date_column, then this can have massive savings on query costs.

Here at Lynchpin, we’re often working with the GA4 → BigQuery data export. This free feature loads a new BigQuery table events_yyyymmdd every day. You can query all the daily export tables with a wildcard * and also filter on the tables in the query using the pseudo-column _TABLE_SUFFIX

SELECT
*
FROM
`lynchpin-marketing.analytics_262556649.events_*`
WHERE
_TABLE_SUFFIX = '20240416';

The problem is incremental loads just don’t work very nicely with these wildcard tables – at least not in the same way as a partitioned table in the earlier example.

-- This performs a full scan of every table - rendering
-- incremental load logic completely useless!
SELECT *,
_TABLE_SUFFIX as source_table_suffix
FROM `lynchpin-marketing.analytics_262556649.events_*`
{% if is_incremental() %}
WHERE _TABLE_SUFFIX > (SELECT MAX(source_table_suffix) FROM {{ this }}
{% endif %}

This is pretty disastrous because scanning every daily table in a GA4 export can be an expensive query, and running this every time you load the model doesn’t have great consequences for your cloud budget 💰.

The reason this happens is down to a quirk in the query optimiser in BigQuery – we have a full explanation and solution to it at the end of this blog 👇 if you want to fix this yourself.


2. You have to rely on the sql_header() macro quite a lot

The sql_header() macro is used to run SQL statements before the code block of your model runs, and we’ve actually found it to be necessary in the majority of our models. For instance, you need it for user defined functions, declaring and setting script variables, and for the solution to quirk #1.

The problem is that sql_header() macro isn’t really fit for purpose and you run into a few issues:

  • You can’t use macros or Jinja inside sql_header() as it can lead to weird or erroneous behaviour, so no using ref, source or is_incremental() for example
  • You can’t include your sql_header() configurations in tests, currently meaning any temporary functions created can’t be recreated in test cases


3. There are a few workarounds needed to truly support a dev and prod environment

dbt supports different environments, which can be easily switched at runtime using the —target command line flag. This is great for keeping a clean development environment separate from production.

One thing we did find a little annoying was configuring different data sources for your development and production runs, as you probably don’t want to have to run on all your prod data every time you run your pipeline in dev. Even if you have incremental loads set up, a change to a table schema soon means you need to run a full refresh which can get expensive if running on production data.

One solution is reducing amount of data using a conditional like so:

{% if target.name == 'dev' %}
        AND date_column BETWEEN
        TIMESTAMP('{{ var("dev_data_start_date") }}')
        AND TIMESTAMP('{{ var("dev_data_end_date") }}')
{% endif %}

This brings in extra complexity to your codebase and is annoying to do for every single one of your models that query a source.

The best solution we saw to this was here: https://discourse.getdbt.com/t/how-do-i-specify-a-different-schema-for-my-source-at-run-time/561/3

The solution is to create a dev version of each source in the yaml file, called {model name_source}_dev (e.g. my_source_dev for the dev version of my_source) and then have a macro that switches which source based on the target value at runtime.

Another example in this vein is getting dbt to enforce foreign key constraints requires this slightly ugly expression switching between schemas in the schema.yaml file

- type: foreign_key
  columns: ["blog_id"]
  expression: "`lynchpin-marketing.{ 'ga4_reporting_pipeline' if target.name!='dev' else 'ga4_reporting_pipeline_dev' }}.blogs` (blogs)"


Explanation and solution to quirk #1

Let’s revisit

SELECT
  *
FROM
  `lynchpin-marketing.analytics_262556649.events_*`
WHERE
  _TABLE_SUFFIX = '20240416';

This is fine – the table scan performed here only scans tables with suffix equal to 20240416 (i.e. one table), and bytes billed is 225 KB

OK, so how about only wanting to query from the latest table?

If we firstly wanted to find out the latest table in the export:

-- At time of query, returns '20240416'
SELECT
    MAX(_TABLE_SUFFIX)
  FROM
    `lynchpin-marketing.analytics_262556649.events_*`

This query actually has no cost!

Great, so we’ll just put that together in one query:

SELECT
  *
FROM
  `lynchpin-marketing.analytics_262556649.events_*`
WHERE
  _TABLE_SUFFIX = (
  SELECT
    MAX(_TABLE_SUFFIX)
  FROM
    `lynchpin-marketing.analytics_262556649.events_*`)

Hang on… what!?

BigQuery’s query optimiser isn’t smart enough to get the value of the inner query first and use that to reduce the scope of tables queried in the outer query 😟

Here’s our solution, which involves a slightly hacky way to ensure the header works in both incremental and non-incremental loads. We implemented this in a macro to make it reusable.

{% call set_sql_header(config) %}
DECLARE table_size INT64;
DECLARE max_table_suffix STRING;
SET table_size = (SELECT size_bytes FROM {{ this.dataset }}.__TABLES__ WHERE table_id='{{ this.table }}');
IF table_size > 0 THEN SET max_table_suffix = (select MAX(max_table_suffix) FROM {{ this }});
ELSE SET max_date = '{{ var("start_date") }}';
END IF;
{% endcall %}
-- Allows for using max_table_suffix to filter source data.
-- Example usage:
SELECT
        *
FROM {{ source('ga4_export', 'events') }}
{% if is_incremental() %}
    WHERE _table_suffix > max_table_suffix
{% endif %}


We hope you found this blog useful. If you happen to use any of our solutions or come across any strange quirks yourself, we’d be keen to hear more!

To find out how Lynchpin can support you with data transformation, data pipelines, or any other measurement challenges, please visit our links below or reach out to a member of our team.

To find out how Lynchpin can help

VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Working with dbt & BigQuery: Some issues we encountered and their solutions appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) ‘(not set)’ landing page values in GA4: Explained 13 Jun 2024 3:42 AM (10 months ago)

In the latest episode of our ‘Calling Kevin’ video series, our Senior Data Consultant, Kevin, tackles a common issue many users face in Google Analytics: an increase to the number of ‘(not set)’ landing page values in GA4.

In the video below, Kevin covers:

  • How session timeouts lead to ‘(not set)’ values
  • How automatic event collection affects landing page data.
  • Tips for checking and fixing your GA4 setup.

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) ‘(not set)’ landing page values in GA4: Explained appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) Breaking down user metrics in GA4 23 May 2024 3:20 AM (10 months ago)

In the latest from our ‘Calling Kevin’ video series, our Senior Data Consultant, Kevin, covers a few commons questions about user metrics in GA4.

In this quick walkthrough, Kevin dishes on the definitions and differences between the user metrics available in GA4 – some of which may be familiar to those experienced with Universal Analytics, while some changes are exclusive to GA4, causing some confusion for Google Analytics users both new and old.

  • How are Users defined in standard reports?
  • What is an Active user in GA4?
  • Are Users and Total users the same?

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) Breaking down user metrics in GA4 appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Data & Analytics for Financial Services: 6 Ways to Unlock Further Value 20 Feb 2024 7:44 AM (last year)

In today’s rapidly evolving financial landscape, leaders face a myriad of challenges – from navigating fluctuations to the market and regulatory scrutiny to grappling with the impacts of changing customer behaviour and increased competition. Now, more than ever, financial services leaders must fully utilise the power of data and analytics to drive informed decision-making and stay ahead of the curve.

As technology continues to be a disruptive force, leaders are also under pressure to embrace innovations in AI, automation, and cloud computing while also keeping a handle on day-to-day challenges concerning growth, efficiency, and experience. Amidst this turbulence, agility is key as organisations must adapt to quickly changing market dynamics and disruptive trends.

This blog will explore practical data and analytics solutions you can deploy to address these needs, as you navigate the increasingly demanding financial services environment.


1. Building a data strategy that enables growth

Growth and insight go hand in hand. A suitable data strategy both stimulates and supports growth and considers your organisation’s unique circumstances throughout.
Whether you are rolling out new cloud-based analytics processes, joining disparate data sources for seamless workflows, deploying analytics testing to guarantee consistent insight delivery, or creating the analytics tools and visualisations for confident planning and decision-making – understanding what insight you need to enable growth and to continually support growth requires experience and clear alignment of the people, skills, planning, processes, and tools to get you there efficiently.

2. Maximising insight

There exists a wealth of attitudinal, behavioural, transactional, and basic data throughout your organisation. You may be using most of it already, but where you are not, having the right level of analytics capability and resourcing available ensures you are using all potential data points to maximise insight and strengthen data quality.
Using processes such as data auditing, data integration, data transformation, analytics reporting and data science, you and your team have access to a foundation of sophisticated and multi-dimensional statistics on users, behaviour, and performance.

3. Demystifying and defining performance with custom reporting and BI

What does good performance look like for your specific organisation? How does your team, your company, and your stakeholders define ‘good’?
Customising reporting and BI to tailor-fit the unique nature of your organisation is often a necessity when analysing performance and gaining the deep-dive information you need for planning and decision-making. With the correct vantage points on your business in place, you can confidently craft your strategy for navigating an evolving and ever-demanding landscape – from marketing, contact centres, risk, pricing, and all other areas of the business in need of consistent reporting and insight.

4. Taking a data-led approach to personalisation

Using the right data collection and transformation processes, organisations can leverage the insight they generate to build a personalisation strategy that stimulates loyalty and supports growth.
For example, by knowing the optimal time to deliver personalised messaging to a segment of users who are expected to churn, or by knowing which products and services are most likely to be successfully up-sold to which users, businesses have the power to create meaningful interactions, form deeper relationships, and confidently diversify messaging to better serve wants and needs of their customers in a trusted and mutually beneficial way.

5. Creating opportunities to optimise spend and enable marketing efficiency

Utilising the right mix of analytics, organisations can maximise the efficacy of their resources and pinpoint areas where their investments yield the greatest returns, thereby uncovering all-important problem areas.
By leveraging techniques such as attribution, MMM, segmentation, predictive analytics, and more, organisations can develop a comprehensive understanding of their brand, their customers, and sales and marketing performance.
Armed with a blend of insights, leaders can identify priorities and remove inefficiencies in their strategy and processes to incrementally improve performance, not only in the short term but for long-term, sustainable growth.

6. Leveraging AI to protect and support your brand

AI-driven technology may be more accessible than you think. AI-powered tools can be implemented in your organisation to do the heavy-lifting in terms of safeguarding your brand and surfacing the insight most critical for your team to action.
Whether you are using sentiment analysis techniques to monitor your brand reputation on social media, surface common pain-points that your customers are leaving on review websites or via market research initiatives; or whether you are using machine learning and automation processes to detect unusual user behaviour and flagging risk – finding the right opportunities to implement AI-driven technology into your workflow guarantees added peace of mind against threats, affording you the time and resourcing to respond to issues before they become larger challenges.


As leaders in financial services navigate a rapidly evolving market, it is evident that capitalising on data and analytics is not just advantageous but imperative for remaining competitive. The insights produced from a robust data strategy, coupled with ability to adapt to technological advancements, regulatory changes, market fluctuations, and evolving customer demands, empowers leaders to make informed decisions.

As we move forward, embracing data-driven approaches will be the cornerstone of success for financial services leaders wanting to make calculated and impactful decisions to thrive in their dynamic landscapes.


Speak to our team


If you would like to explore any of the themes raised in this article, please get in touch. We don’t have salespeople and your first point of contact will be a subject matter expert from our leadership team.

The post Data & Analytics for Financial Services: 6 Ways to Unlock Further Value appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Reviewing AWS Step Functions: The benefits and limitations 22 Jan 2024 8:42 AM (last year)


Introduction

We began to explore the role of AWS Step Functions during a complex data transfer project for a client. A key highlight of AWS Step Functions is its user-friendly drag-and-drop visual composer, a much-appreciated feature when dealing with extensive scripts. This blog post aims to discuss some benefits and limitations encountered while utilising the service.

Step Functions, a core component of AWS, provides a serverless orchestration service. It enables the coordination of multiple AWS services into adaptable workflows, referred to as ‘state machines’. This service facilitates the creation of complex business processes by integrating AWS Lambda, Amazon S3, and Amazon DynamoDB, among others.

Here are some example use cases where AWS Step Functions can be applied in a digital marketing context:

  1. Marketing attribution – Creating a state machine that does the necessary data transforms by calling AWS Glue jobs, which then feeds into an AI notebook in SageMaker where the attribution model runs. Finally, the output can be directed to Amazon Relational Database Service (RDS) ready for consumption.
  2. Server-side tagging – With Step Functions’ scalable properties, it’s possible to handle each request coming through from a website, making changes/additions using Lambda functions before pushing to a server-side tag management system. While many tag managers provide the ability to make changes on the fly, this setup offers the entire fleet of AWS tools for modifications.
  3. Collating data from APIs – AWS Step Functions can be used to aggregate data from various external APIs. By creating a workflow that triggers Lambda functions to call APIs, parse responses, and store the results in a database like Amazon RDS or DynamoDB, data from different sources can be efficiently collated and processed. This is particularly useful for applications that rely on real-time data from multiple services.

The advantages of using AWS Step Functions are numerous:

  • The serverless nature of Step Functions relieves users from managing infrastructure. AWS takes care of scaling, provisioning, and maintenance, allowing focus on workflow definition and optimisation.
  • Its integration with the AWS ecosystem is remarkable. For instance, executing a Lambda function is as straightforward as identifying satellites in AWS Ground Station. The integration with the AWS API expands workflow possibilities, offering integrations across a wide range of AWS services.
  • Each state in a Step Function workflow enhances error handling and retry capabilities. In the case of known errors, workflows can be extended to restart processes, maintaining continuity in the existing workflow. This feature is particularly useful for managing errors across various workflows.
  • The visual representation of workflows, while not perfect, provides a clear overview of the process’s progress. Diagnostic information, such as Lambda function logs, are accessible within the workflow UI, aiding in troubleshooting. See the screenshot below which visually demonstrates where the error occurred in the execution.

Screenshot demonstrating an error in AWS Step Functions

However, there are some drawbacks to Step Functions in AWS:

  • Integration with Step Functions may require alterations or complete rewrites of existing components. For instance, modifying Lambda functions to handle JSON formats for workflow compatibility can be time-consuming.
  • Vendor lock-in is a significant concern. Few alternatives match Step Functions’ capabilities, making a switch to another cloud provider a daunting task. The introduction of third-party endpoint integration in Step Functions doesn’t fully mitigate this issue, as switching cloud providers often necessitates a complete overhaul of workflows.
  • The drag-and-drop feature, while visually appealing, doesn’t guarantee logical correctness of workflows. The lack of intuitive syntactical checking can lead to a false sense of ease in workflow creation, overlooking potential limitations. See the screenshot below where a state machine can be created which creates a Lambda function, deletes the Lambda function, and then tries to invoke the very same Lambda function. Note that this state machine has been created but obviously would never work.

Screenshot demonstrating an error in AWS Step Functions


Conclusion

In conclusion, while the visual composer of AWS Step Functions serves primarily as a planning tool, it has proven beneficial in designing microservice architectures, akin to tools like Microsoft’s Azure Logic Apps or Google Cloud Workflows.
Despite initial disappointments, the process of planning and executing workflows within Step Functions can be rewarding.

Looking ahead, the hope is that AWS will evolve Step Functions into a more intuitive, drag-and-drop-centric solution, making cloud computing more accessible and reducing the technical complexity currently involved. The potential of Amazon Q, with its generative AI-driven insights and utilisation of previously built state machines, promises to take this efficiency to the next level, potentially inspiring similar innovations across the cloud computing landscape.

The post Reviewing AWS Step Functions: The benefits and limitations appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?