Lynchpin View RSS

Independent analytics consultancy, using data science, data strategy and data engineering to launch you higher
Hide details



(Video) How to customise your Report library in GA4 20 Jun 9:01 AM (22 days ago)

In the latest episode of our ‘Calling Kevin’ video series, we show you how to customise your GA4 report library – updating your Google Analytics reporting interface to include a new, personalised collection of reports.

Follow these quick and easy steps to begin tailoring your GA4 report library menu and navigation for a more efficient reporting experience. We also share a helpful refresher on how to work with topics, templates, plus more!

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) How to customise your Report library in GA4 appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Data pipelines using Luigi – Strengths, weaknesses, and some tops tips for getting started 1 May 8:38 AM (2 months ago)

At Lynchpin, we’ve spent considerable time testing and deploying Luigi in production to orchestrate some of the data pipelines we build and manage for clients. Luigi’s flexibility and transparency make it well suited to a range of complex business requirements and to seamlessly support more collaborative ways of working.

This blog post draws from our hands-on experience with the tool – stress-tested to really understand how it performs day to day in real-world contexts. We will begin by walking through what Luigi is, why we like it, where it could be improved, and share some practical tips for those considering using it to enhance their data orchestration processes and data pipeline capabilities.


What is Luigi and where does it sit in the data pipelines market?

Luigi is an open-source tool developed by Spotify that helps automate and orchestrate data pipelines. It allows users to define dependencies and build tasks with custom logic using Python, offering flexibility and a fairly low barrier to entry for its quite complex functionality.

Despite Spotify’s introduction of a newer orchestration tool, Flyte, Luigi is still widely used by many major brands and continues to receive updates – allowing it to continually mature and become a reliable choice for a range of data orchestration use cases.

Luigi sits amongst many popular tools used for data orchestration in the data engineering space – some of which are paid, while others are similarly open source.

Another tool we’ve used for data orchestration is Jenkins. Although it isn’t designed for more heavy-duty pipelines, we’ve found it to work very well as a lightweight orchestrator, managing tasks and dependencies.

In the following section, we’ll break down some benefits of using Luigi for your data pipelines and a few reasons why you may choose it over a comparable tool such as Jenkins.


What we like ✅

Transparent version control:

One of the key advantages of Luigi is that it’s written in Python. This gives you transparent version control over your data pipelines – every change is committed and traceable: you know exactly what change has been made, you can inspect it, you can see who did it, and when it was done. This becomes even more powerful when linked to a CI/CD pipeline, which we do for some of our clients, as this means that any change to the pipeline in the repository is automatically the truth.

With Jenkins, for example, changes can be made and it’s not necessarily obvious what was changed or by which team member (unless explicitly communicated) – which becomes increasingly important when you’re managing more complex data pipelines with many moving parts and dependencies.

Dependency handling and custom logic capabilities:

Managing data pipeline dependencies is where Luigi truly stands out. In a tool like Jenkins, downstream tasks can be orchestrated but this often requires careful scheduling or wrapper jobs, which can get complicated and quite manual as a process depending on the complexity of your needs. Luigi simplifies this and enables smoother levels of automation by allowing you to define all dependencies directly in Python, allowing for logic such as: ‘Run a particular job only after a pipeline completes, and only do this on a Sunday or if it failed the previous Sunday.’

This level of custom logic is trivial in Python but can be difficult to replicate in Jenkins, where perhaps the only option is to run on a Sunday without any conditions surrounding it.

Pipeline failure handling:

Luigi considers all tasks idempotent. Once a task has run, it’s marked as ‘done’ and won’t be re-run unless you manually remove its output. This is a particularly useful feature if you have big, complex pipelines and only need to re-run certain jobs that have failed. You won’t need to re-run everything, but can find the failed task, delete its output file, and save time when re-executing the job.

Backfilling at the point of a task:

Luigi handles backfilling easily by allowing users to pass parameters directly into tasks.

This allows you to retrieve historical data (for example, backfilling from the beginning of last year to present) without having to change the script or config files.

Luigi will treat tasks with parameters like new tasks, so if the job had previously run, it’ll recognise the changed parameters and simply pass those parameters through.

Efficiency to set up, host, and use alongside existing infrastructure:

While tools such as Apache Airflow may require a Kubernetes cluster (and more) to begin running, Luigi, by contrast, is far simpler to host. You can run it on a basic VM (Virtual Machine) or through a tool like Google Cloud Platform, using a Cloud Run job. This makes it a great choice for smaller data pipelines or client-specific pipelines where you may want to decouple from the main infrastructure.

Market maturity and active use and development by many large brands:

Luigi is used by many users – including a host of major brands over the years, such as Squarespace, Skyscanner, Glossier, SeatGeek, Stripe, Hotels.com, and more. This is integral to its maintenance and viability as a good open-source tool. Its core functionality rarely changes, making it a stable and reliable choice for users; We found that any updates we’ve experienced are primarily focused on maintaining security rather than big rehauls to its functionality, which brings us to a few of its shortfalls…


What we don’t like ❎

Limited frontend and UI:

Luigi’s frontend leaves a lot to be desired. Firstly, it only really shows you jobs that are running or have recently succeeded in running, so if you have many running jobs in one day, the History tab fails to give you a strong overview of information.

When something fails, you’ll be notified, and you can inspect logs in a location that you previously specify, however it would be nice if the frontend provided a good summary of this information instead.

Workarounds do exist, such as saving your task history (e.g., tasks that ran, the status, how long they took, etc) in a separate table (for example, Postgres) where it can be visualised in an external run dashboard – providing a more personalised frontend for better monitoring, visibility into run times, failure rates, and so on.

Setting something like this up would provide more feature parity with a tool such as Jenkins, which, by contrast, does a great job at providing stats and visual indicators for task history, job health, what’s running, and more – right out of the box.

Example of data pipelines built and managed using Jenkins.

Example of data pipelines built and managed using Luigi.

Documentation could be improved:

While Luigi provides all the key documentation you need, it’s not always the easiest to find or navigate – this, when compared to tools such as dbt, makes documentation as a whole feel sparse in places, especially when dealing with more advanced features or plugins.
For instance, helpful features such as enabling dependency diagrams or tracking task history involves installing separate modules, which is a process that isn’t particularly well-explained in their official documentation.

In many instances, users may find themselves gaining the most clarity about how the tool works by trying things out and learning as they go.

Python path issues – everything must be clear or else Luigi will struggle to find it:

To avoid a barrage of ‘module not found’ errors, Luigi will need to know exactly where everything lives in your environment.

A workaround we found useful is creating a Shell Script that sets out all necessary paths and everything Luigi may possibly need to run successfully.

While something like this may take a little time to set up, it’s a small level of upfront effort to improve your workflow in Luigi and avoid any issues in the longer run.


Our top tips for getting started: (Data pipelines using Luigi)

  • If one of your tasks fails, make sure you delete your output file before running it again. If Luigi registers an output file, it’ll automatically assume the task is done, and therefore skip it in the re-run, assuming it was completed successfully.
  • To make up for Luigi’s limited frontend, we think it’s worth your time to set up your own custom run dashboard to monitor tasks and compensate for a UI which falls short of its competitors and doesn’t provide a tidy and complete overview of tasks.
  • For a smooth and pain-free setup, we recommend using a Shell Script to handle Python paths and prevent any issues that may cause files from being easily located by Luigi.
  • Be prepared to dive in and get your hands dirty to really understand how things work in Luigi. Documentation is thin in places or sometimes hard to find when compared to other tools on the market, so you may find there is a bit of a learning curve or trial and error process to be aware of.


Conclusion:

We think Luigi is a powerful data orchestration tool for anyone comfortable with Python, who has experience managing data pipelines, and is comfortable getting to grips with a few of its quirks that may make onboarding a bit challenging.

If you’re looking for an alternative to tools like Apache Airflow or Jenkins, Luigi is definitely worth trying out. While we recognise that its UI and documentation are lacking when compared to other tools in this space, we found that Luigi’s version controlling, dependency handling, and logic capabilities make it a handy tool for a range of our clients’ use cases.


For more information on how we can support your organisation with data pipelines and data orchestration – including custom builds, pipeline management, debugging and testing, and optimisation services – please feel free to contact us or explore the links below

CONTACT USVIEW OUR CAPABILITIESVIEW OUR SERVICES

The post Data pipelines using Luigi – Strengths, weaknesses, and some tops tips for getting started appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Automated testing: Developing a data testing tool using Cursor AI 11 Dec 2024 8:17 AM (7 months ago)

In this blog, we discuss the development of an automated testing project, using the AI and automation capabilities of Cursor to scale and enhance the robustness of our data testing services. We walk through project aims, key benefits, and considerations when leveraging automation for analytics testing.


Project Background:

On an ongoing basis, we upgrade a JavaScript library that we manage, to include a number of improvements and enhancements, which is deployed to numerous sites. The library integrates with different third-party web analytics tools and performs a number of data cleaning and manipulation actions. Once we upgrade the library, our main priorities are:

Feature testing: Verify new functionality across different sites/environments

Regression testing: Ensure existing functionality has not been negatively affected across different sites

To achieve this, we conduct a detailed testing review across different pages of the site. This involves performing specific user actions (such as page views, clicks, search, and other more exciting actions) and ensuring that the different events are triggered as expected. We capture network requests for outgoing vendors, such as Adobe Analytics or Google Analytics through the browser’s developer tools or a network debugging tool (e.g., Charles) and verify if the correct events are triggered and relevant parameters are captured accurately in the network requests. By ensuring that all events are tracked with the right data points, we can confirm that both new features and the existing setup are working as expected.

Project Aim:

To optimise this process and reduce the manual effort involved, we developed an automated testing tool designed to streamline and speed up data testing. As an overview, this tool automatically simulates user actions on different sites and different browsers, triggering the associated events, and then checks network requests to ensure that the expected events are fired, and the correct parameters are captured.

Automated Testing Benefits:

In the era of AI, automation is a key driver of efficiency and increased productivity. Automating testing processes offers several key benefits to our development and data testing capabilities, such as:

  • Reduces setup time and creating testing documentation: We’re able to run through different tests and scenarios with a one-time setup for each site and each version.
  • More accurate data testing: With a thought-out test plan which is followed precisely, we’re able to put more trust in our testing outcome. This helps us identify issues quicker.
  • Better test coverage: We can run tests on different browsers and devices, using the same setup.

How We Did It:

We chose Python as the primary scripting language, as it offers flexibility for handling complex tasks. Python’s versatility and extensive libraries made it an ideal choice for rapid development and iteration.

For simulating a variety of user interactions and conducting tests across multiple browsers, we selected Playwright. Playwright is a powerful open-source automation tool/API for browser automation. It supports cross-browser data testing (including Chrome, Safari, Firefox), allowing us to validate network requests across a broad range of environments.

We used the Cursor AI code editor to optimise the development process and quickly set up the tool. Cursor’s proprietary LLM, optimised for coding, enabled us to design and create scripts efficiently, accelerating development by streamlining the debugging and iteration process. Cursor’s AI-assistant (chat sidebar) boosted productivity by providing intelligent code suggestions, speeding up debugging and investigation. We’ll dive into our experience using Cursor a bit further in the next section

Lastly, we chose Flask to build the web interface where users can select different types of automated testing. Flask is a lightweight web framework for Python, which we’ve had experience with for other projects. It has its pros and cons, but a key benefit of this project was that it allowed us to get started quickly and focus more on the nuts and bolts of the program.

Our Experience with Cursor:

Cursor AI played a crucial role in taking this project from ideation to MVP. By carefully prompting Cursor’s in-editor AI assistant, we were able to achieve the results we wanted. The tool allowed us to focus on the core structure of the program and the logic of each test without getting bogged down in documentation and finicky syntax errors.

Cursor also gave us the capability to include specific files, documentation links, and diagrams as context for prompts. This allowed us to provide relevant information for the model to find a solution. Compared to an earlier version of Github’s copilot that we tested, we thought this was a clear benefit in leading the model to the most appropriate outcome.

Another useful benefit of Cursor AI was the automated code completion, which could identify bugs and propose fixes, as well as suggest code to add to the program. This feature was useful when it understood the outcome we were aiming for, which it did more often than not.

However, not everything was plain sailing, and our experience did reveal some drawbacks to using AI code editors to be mindful of. For example, relying too much on automated suggestions can distance yourself from the underlying code, making it harder to debug complex issues independently. It was important to review the suggested code and use Cursor’s helpful in-editor diffs to clearly outline the proposed changes. This also allowed us to accept or reject these changes, giving us a good level of control.

Another drawback we noticed is that AI-generated code may not always follow best practices or be optimised for performance, so it’s crucial to review and validate the output carefully. For example, Cursor tended to create monolithic scripts instead of separating functionality into components, such as tests and Flask-related parts, which would be easier to manage in the long term.

Another point we noticed was that over-reliance on AI tools could easily lead to complacency, potentially affecting our problem-solving skills and creativity as developers. When asking Cursor to make large changes to the codebase, it can be easy to just accept all changes and test if they worked without fully understanding the impact. When developing without AI assistance (like everyone did a couple of years ago), it’s better to make specific and relatively small changes at a time to reduce the risk of introducing breaking changes and to better understand the impact of each change. This seems to be a sensible approach when working with a tool like Cursor.

What We Achieved – Efficiencies Unlocked:

The automated testing tool we developed significantly streamlined and optimised the data testing process in a number of key ways:

  • Accelerated project development: Using Cursor AI, we rapidly moved through development and completed the project in a short period. The AI-driven interface, combined with Playwright’s capabilities, sped up our debugging process—a major challenge in previous R&D projects. In the past, we often faced delays due to debugging blockers, but now, with the AI assistant, we could directly identify and fix issues, completing the project in a fraction of the time.
  • Built a robust, reusable tool: The tool is scalable and flexible, and can be adapted for different analytics platforms (e.g., Google Analytics, Meta, Pinterest). It is reusable across different projects and client needs, as well as different browsers and environments.
  • Time efficiency & boosted productivity: One of the most valuable outcomes was the significant reduction in manual testing time. With the new automated testing tool, we ran multiple test cases simultaneously, speeding up the overall process. This helped us meet tight project deadlines and improve client delivery without sacrificing quality. Additionally, it freed up time for focusing on challenging tasks and optimising existing solutions.


Conclusion:

With AI, the classic engineering view of ‘why spend 1 hour doing something when I can spend 10 hours automating it?’ has now become ‘why spend 1 hour doing something when I can spend 2-3 hours automating it?’. In this instance, Cursor allowed us to lower the barrier for innovation and create a tool to meet a set of tight deadlines, whilst also giving us a feature-filled, reusable program moving forwards.


For more information about how we can support your organisation with data testing – including our automated testing services – please feel free to contact us now or explore the links below

CONTACT USVIEW OUR CAPABILITIESVIEW OUR SERVICES

The post Automated testing: Developing a data testing tool using Cursor AI appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) Applying RegEx filtering in Looker Studio to clean up and standardise GA4 reporting 15 Nov 2024 7:36 AM (7 months ago)

In the latest episode of our ‘Calling Kevin’ video series, we show you how to clean up and filter URLs using a few simple expressions in Looker Studio.

By applying these Regular Expressions (RegEx), you can easily remove duplicates, fix casing issues, and tidy up troublesome URL data to standardise GA4 reporting – just as you would have been able to in Universal Analytics.

Expressions used:

  1. To remove parameters from a page path: REGEXP_EXTRACT(Page, “^([\\w-/\\.]+)\\??”)
  2. To remove trailing slash from a page path: REGEXP_REPLACE(Page, “(/)$”, “”)
  3. To make a page path lowercase: LOWER()
    Combined: LOWER(REGEXP_REPLACE(REGEXP_EXTRACT(Page path + query string, “^([\\w-/\\.]+)\\??”), “(/)$”, “”))

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) Applying RegEx filtering in Looker Studio to clean up and standardise GA4 reporting appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Webinar: Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness 3 Oct 2024 1:54 AM (9 months ago)

How do you know what’s working and not working and plan for success as the tides of digital measurement continue to change?

The themes of privacy, measurement and marketing effectiveness triangulate around a natural trade and tension: balancing the anonymity of our behaviours and preferences against the ability for brands to reach us relevantly and efficiently.

In this briefing our CEO, Andrew Hood, gives you a practical and independent view of current industry trends and how to successfully navigate them.

Want the full-length white paper?


Building on the themes introduced in the webinar, our white paper lays out an in-depth look at the privacy trends, advanced measurement strategies, and balanced approach you can take to optimise marketing effectiveness.

Unlock deep-dive insight and practical tips you can begin implementing today to guide your focus over the coming months.

VIEW NOW

To access a copy of the slides featured in the webinar, click the button below

View presentation slides

The post Webinar: Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Benefits of marketing mix modelling: Why is MMM so popular right now? 5 Sep 2024 5:01 AM (10 months ago)

Introduction

The concept of marketing mix modelling (often referred to as just ‘MMM’) has been around for a while – as early as the 1960s in fact – which should be no surprise, as the business challenge of what marketing channels to use and where best to spend your money has always been the essence of good marketing, at least if somebody is holding you accountable for that spend and performance!

Marketing mix modelling has its foundations in statistical techniques and econometric modelling, which still holds largely true today. However, the mix of channels and advancements in end-to-end analytics create new challenges to be tackled, not least the expectations of what MMM is and what it can deliver.
In reality, there are various analytics techniques that can be undertaken to answer the overall business question: ‘how do my channels actually impact sales?’. In this blog we will answer some common questions about MMM, address some common (comparable) techniques, and share how and when you might look to choose one method over another.

What is marketing mix modelling?

MMM is a statistical technique, with its roots in regression, that aims to analyse the impact of various marketing tactics on sales over time (other KPIs are also available!). Marketing mix modelling will consider all aspects of marketing to do this, such as foundational frameworks like ‘The 4 Ps of Marketing’ (Product, Price, Place and Promotion).

MMM is similar to econometric modelling in terms of techniques used, however there are some key differences. On the whole, econometrics is broader in its considerations and applications, often encompassing aspects of general economic factors in relation to politics, international trade, public policy and more. MMM, on the other hand, focusses more specifically on marketing activities and their impact on business outcomes.

You might also come across the term ‘media mix modelling’ (with the same unhelpful acronym, ‘MMM’). Much like econometrics, media mix modelling tends to differ from marketing mix modelling due to its scope and general objective. Media mix modelling tends to have an even narrower focus than marketing mix modelling; As the name implies, it’s aimed more specifically on optimising a mix of media channels, focussing on optimising advertising spend.

Whether its marketing mix modelling or media mix modelling you are looking at, the key is to consider the business question you are looking to answer and ensure your model is trained using the best input variables to answer that question – Nothing new in the world of a good analytics project!

Why is MMM seeming to gain traction recently?

In recent years, the general trend has been to measure everything, integrate everything, and to link all of your data together, leaving no doubt about who did what, when, and to what end. However increasing concerns (or at least considerations) around data privacy and ethics has caused marketers to take a second look at how they collect and utilise their data.
There is a growing need to adapt to new privacy regulations, but also a greater desire to respect an individual’s privacy and find better ways to understand what marketing activities drive positive or negative outcomes.

With limitations on the ability to track 3rd party cookies, approaches such as marketing attribution may become more difficult to implement, although the effectiveness of these data sources is in itself doubtful. And with consent management becoming increasingly granular, even 1st party measurement can leave gaps in your data collections.
However, the power that marketing attribution gave marketers is well recognised now and the desire to continue to be data-led is only increasing. Machine learning has become a commonplace tool in beginning to fill the gaps that are creeping back in to the tracking of user behaviour. Organisations are also increasingly eager to build on the power of what they have learnt with these joined-up customer journeys, and there is that need again to look across the whole of marketing, not just these digital touchpoints, and replicate that approach in a more holistic way.

So in summary, while marketing mix modelling has never gone away, it is now seeing a revival as an essential tool in a marketer’s toolbelt.

The benefits of MMM: Why should organisations consider using marketing mix modelling services?

MMM is a great tool for any organisation looking to be more data led in their approach to planning and analysis of marketing activities. Key benefits of MMM include:

Ability to measure and optimise the effectiveness of marketing and advertising campaigns:
The purpose of MMM is to measure the impact of your marketing activities on your business outcomes. A well-built marketing mix model will enable you to quantify ROI by channel and make better data-led decisions on the mix of marketing activities that will lead to more optimised campaigns.

Natural adeptness at cross-channel insights:
With increasing limitations on tracking users across multiple channels the methodology for MMM neatly side steps these restrictions by using data at an aggregated level. By its very nature it doesn’t require linking user identities across different devices or tracking individuals using offline channels.

Enables more strategic planning and budgeting:
MMM provides data-driven insight to inform budget planning processes. Its outputs are transparent, allowing organisations to understand the impact each of their channels have on business outcomes and how those channels influence each other within the mix. By incorporating MMM with other tools for scenario planning, spend optimisation and forecasting, organisations can better understand what happened in the past to plan more effectively for the future.

Can be used when granular level data is not available:
As mentioned earlier, MMM works with data at an aggregated level. This offers more flexibility when looking to integrate data inputs into your decision making such as:

  • Linking offline activity with online sales
  • Linking online activity with offline sales
  • Understanding impact of external influences such as macroeconomic factors, seasonality, competitor activities etc

Has a longer-term focus:
MMM is a powerful technique for longer term planning and assessing the impact of campaigns that don’t necessarily provide immediate impact (e.g. brand awareness campaigns, TV, and display advertising etc). By incorporating MMM into a measurement strategy, businesses can ensure longer-term activity is appropriately considered.

Marketing mix modelling vs. marketing attribution modelling: How do they differ? What are the pros and cons?

Earlier in this blog we looked at how marketing mixed modelling compares to econometrics and media mixed modelling. Another very important modelling approach to consider when looking at marketing effectiveness is marketing attribution.
Marketing attribution differs from marketing mix modelling in a number of important ways – most importantly by relying on a more granular approach. It looks to assign weightings to each individual touchpoint on the customer journey, incorporating each user’s journey and determining whether that journey leads to a successful conversion or not.
This very detailed understanding of how each customer interacts with your channels can be very powerful, but also very complex and time consuming to both collect and analyse; In addition with the increasing limitations on tracking individuals without their consent, you may end up having to rely on only a partial picture of the user journey.
While of course it is possible to model on a subset of data, you would need to be careful that the user journey you are looking to understand is not unfairly weighted to those channels (or individuals) that are easier to track.
Marketing attribution also uses a wider range of modelling algorithms, from the simple (linear, time-decay) to the more complex (Markov Chains, Game Theory, ML models). This range of models to select from can be both a benefit and a hindrance, with difficulties arising when you’re not sure what marketing attribution model will suit your business needs best.

Marketing mix modelling does have its own drawbacks to consider too. The biggest consideration when determining if MMM is suitable for you is to understand how much historical data you have.
While a marketing attribution model can work on just a few months of data, so long as it has decent volume and is fairly representative of your typical user journeys, MMM relies on trends over longer periods of time – typically a minimum of 2 years’ worth of data is advised before undertaking an MMM project. MMM also works best when looking at the broader impacts marketing has on your goals. Therefore, if you need to analyse specific campaign performance or delve deeper into specific channels, then marketing attribution will be the better bet.

Can MMM and marketing attribution complement each other?

In a previous blog, we discussed the merits of using both marketing attribution and MMM side by side to provide a more powerful and comprehensive understanding of marketing effectiveness.
While a marketing attribution model will focus on individual touchpoints and their contributions, MMM will take a holistic view, considering the overall impact of marketing inputs. By combining these two approaches, marketers can gain a more complete picture of how different marketing elements work together to drive business outcomes and the demystify the balance they needed across marketing activity for maximum business performance.

Summary

Marketing mix modelling is very a powerful and well-established statistical technique. Most marketers should be at least exploring the benefits and insight it provides into the relationship between marketing activity and business performance to optimise planning and decision making.

Some barriers to entry in starting an MMM project can be navigating what may appear to be a complex set of approaches and techniques. While variations of MMM do exist – econometrics, marketing mix modelling, and media mix modelling – the key difference lies in the scope and objective of the business question you aim to answer. Successfully choosing and developing a model depends on fully understanding your business needs and the data available to you. Investing time upfront to determine what you are looking to achieve is essential in getting the right outcomes.

MMM is best used for strategic planning and determining longer term impacts of your marketing activities. Therefore, if you require more in-depth campaign and channel analysis, then marketing attribution may be more suitable for your business needs. However, it’s important to note that MMM and marketing attribution can work side by side to develop a more complete picture of your marketing activities. While MMM allows greater flexibility when working with a mix of channels that are both tracked and not tracked, the ability of marketing attribution to provide a more granular analysis of your marketing journeys, channels, and campaigns allows for day-to-day optimisation of your marketing activities alongside the longer-term strategy set out by your MMM insights.


If you are ready to explore MMM, marketing attribution, or anything in between, we’d be delighted to discuss your needs in more detail.

For more information on boosting marketing effectiveness and how our team can assist with topics raised in this blog, please visit the links below

Register for our Webinar: ‘Navigating Recent Trends in Privacy, Measurement & Marketing Effectiveness’ on 25 Sept 2024VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Benefits of marketing mix modelling: Why is MMM so popular right now? appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Google (finally) supports Custom Event Data Import in GA4 17 Jul 2024 3:06 AM (12 months ago)

Decorative graphic to illustrate Custom Event Data Import in GA4

Google has recently updated their GA4 ‘Data Import’ feature to finally support Custom Event metadata. This is a significant development, but before we dive in, let’s remind ourselves of a key point: Despite its name, which can give false hope after an outage, ‘Data Import’ is NOT a solution for repopulating lost data. It is however a powerful tool for augmenting existing data with information that isn’t directly collected in GA4. Common sources that we find our clients wanting to integrate include CRM systems, offline sales data, or other third-party analytics tools.

When would Custom Event Data Import be useful?

Well, there are many cases:

The information we import might not be available until after collection. This could include data that is processed or generated by third-party tools after the event has already occurred. A prime example would be cost data for non-Google ad clicks and impressions.

Some information might not be something we want exposed on our site. Importing such data ensures it remains secure and is only used for internal analysis. These might include things like a product’s wholesale price, or a user’s lifetime customer value.

Information collected offline, such as in-store purchases or interactions, could be integrated with your existing GA4 data to allow for a more complete view of customer behaviour across both online and offline touchpoints.

Although Data Import supported Cost, Product, and User-scoped data, what was conspicuously absent up until now was the ability to import data directly scoped to existing Custom Events. This is particularly significant because, as Google likes to remind us, GA4 is ultimately event-based.

To understand if this development could be useful for you, consider the events you already track. Is there any information directly related to these events and their custom dimensions that you don’t collect in GA4, but have available offline or in another tool? If so, Custom Event data import could be very handy.

It’s been a long and somewhat painful journey with GA4, but it’s great to see it gradually becoming feature complete.

Of course, if you’re looking to augment your GA4 data with information available at the point of collection, Lynchpin would recommend harnessing the power of a server-side GTM implementation to augment your GA4 data before it even arrives to GA4 itself.

For more information on server-side GTM and its advantages we highly recommend reading the blogs below:


To discuss any of the topics mentioned in this blog or to find out how Lynchpin can support you with any other data and analytics query, please do not hesitate to reach out to a member of our team.

For more information about what we do

VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Google (finally) supports Custom Event Data Import in GA4 appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Working with dbt & BigQuery: Some issues we encountered and their solutions 2 Jul 2024 12:51 AM (last year)

Here at Lynchpin, we’ve found dbt to be an excellent tool for the transformation layer of our data pipelines. We’ve used both dbt Cloud and dbt Core, mostly on the BigQuery and Postgres adapters.

We’ve found developing dbt data pipelines to be a really clean experience, allowing you to get rid of a lot of boilerplate or repetitive code (which is so often the case writing SQL pipelines!).

It also comes with some really nice bonuses like automatic documentation and testing along with fantastic integrations with tooling like SQLFluff and the VSCode dbt Power User extension.

As with everything, as we’ve used the tool more, we have found a few counter-intuitive quirks that left us scratching our heads a little bit, so we’d thought we share our experiences!

All of these quirks have workarounds, so we’ll share our thoughts plus the workarounds that we use.


Summary:

  1. Incremental loads don’t work nicely with wildcard table queries in BigQuery
  2. The sql_header() macro is the only way to do lots of essential things and isn’t really fit for purpose.
  3. Configuring dev and prod environments can be a bit of a pain


1. Incremental loads with wildcard tables in BigQuery

Incremental loads in dbt are a really useful feature that allows you to cut down on the amount of source data a model needs to process. At the cost of some extra complexity, they can vastly reduce query size and the cost of the pipeline run.

For those who haven’t used it, this is controlled through the is_incremental() macro, meaning you can write super efficient models like this.

SELECT *
FROM my_date_partitioned_table
{% if is_incremental() %}
WHERE date_column > (SELECT MAX(date_column) FROM {{ this }}
{% endif %}

This statement is looking at the underlying model and finding the most recent data based on date_column. It then only queries the source data for data after this. If the table my_date_partitioned_table is partitioned on date_column, then this can have massive savings on query costs.

Here at Lynchpin, we’re often working with the GA4 → BigQuery data export. This free feature loads a new BigQuery table events_yyyymmdd every day. You can query all the daily export tables with a wildcard * and also filter on the tables in the query using the pseudo-column _TABLE_SUFFIX

SELECT
*
FROM
`lynchpin-marketing.analytics_262556649.events_*`
WHERE
_TABLE_SUFFIX = '20240416';

The problem is incremental loads just don’t work very nicely with these wildcard tables – at least not in the same way as a partitioned table in the earlier example.

-- This performs a full scan of every table - rendering
-- incremental load logic completely useless!
SELECT *,
_TABLE_SUFFIX as source_table_suffix
FROM `lynchpin-marketing.analytics_262556649.events_*`
{% if is_incremental() %}
WHERE _TABLE_SUFFIX > (SELECT MAX(source_table_suffix) FROM {{ this }}
{% endif %}

This is pretty disastrous because scanning every daily table in a GA4 export can be an expensive query, and running this every time you load the model doesn’t have great consequences for your cloud budget 💰.

The reason this happens is down to a quirk in the query optimiser in BigQuery – we have a full explanation and solution to it at the end of this blog 👇 if you want to fix this yourself.


2. You have to rely on the sql_header() macro quite a lot

The sql_header() macro is used to run SQL statements before the code block of your model runs, and we’ve actually found it to be necessary in the majority of our models. For instance, you need it for user defined functions, declaring and setting script variables, and for the solution to quirk #1.

The problem is that sql_header() macro isn’t really fit for purpose and you run into a few issues:

  • You can’t use macros or Jinja inside sql_header() as it can lead to weird or erroneous behaviour, so no using ref, source or is_incremental() for example
  • You can’t include your sql_header() configurations in tests, currently meaning any temporary functions created can’t be recreated in test cases


3. There are a few workarounds needed to truly support a dev and prod environment

dbt supports different environments, which can be easily switched at runtime using the —target command line flag. This is great for keeping a clean development environment separate from production.

One thing we did find a little annoying was configuring different data sources for your development and production runs, as you probably don’t want to have to run on all your prod data every time you run your pipeline in dev. Even if you have incremental loads set up, a change to a table schema soon means you need to run a full refresh which can get expensive if running on production data.

One solution is reducing amount of data using a conditional like so:

{% if target.name == 'dev' %}
        AND date_column BETWEEN
        TIMESTAMP('{{ var("dev_data_start_date") }}')
        AND TIMESTAMP('{{ var("dev_data_end_date") }}')
{% endif %}

This brings in extra complexity to your codebase and is annoying to do for every single one of your models that query a source.

The best solution we saw to this was here: https://discourse.getdbt.com/t/how-do-i-specify-a-different-schema-for-my-source-at-run-time/561/3

The solution is to create a dev version of each source in the yaml file, called {model name_source}_dev (e.g. my_source_dev for the dev version of my_source) and then have a macro that switches which source based on the target value at runtime.

Another example in this vein is getting dbt to enforce foreign key constraints requires this slightly ugly expression switching between schemas in the schema.yaml file

- type: foreign_key
  columns: ["blog_id"]
  expression: "`lynchpin-marketing.{ 'ga4_reporting_pipeline' if target.name!='dev' else 'ga4_reporting_pipeline_dev' }}.blogs` (blogs)"


Explanation and solution to quirk #1

Let’s revisit

SELECT
  *
FROM
  `lynchpin-marketing.analytics_262556649.events_*`
WHERE
  _TABLE_SUFFIX = '20240416';

This is fine – the table scan performed here only scans tables with suffix equal to 20240416 (i.e. one table), and bytes billed is 225 KB

OK, so how about only wanting to query from the latest table?

If we firstly wanted to find out the latest table in the export:

-- At time of query, returns '20240416'
SELECT
    MAX(_TABLE_SUFFIX)
  FROM
    `lynchpin-marketing.analytics_262556649.events_*`

This query actually has no cost!

Great, so we’ll just put that together in one query:

SELECT
  *
FROM
  `lynchpin-marketing.analytics_262556649.events_*`
WHERE
  _TABLE_SUFFIX = (
  SELECT
    MAX(_TABLE_SUFFIX)
  FROM
    `lynchpin-marketing.analytics_262556649.events_*`)

Hang on… what!?

BigQuery’s query optimiser isn’t smart enough to get the value of the inner query first and use that to reduce the scope of tables queried in the outer query 😟

Here’s our solution, which involves a slightly hacky way to ensure the header works in both incremental and non-incremental loads. We implemented this in a macro to make it reusable.

{% call set_sql_header(config) %}
DECLARE table_size INT64;
DECLARE max_table_suffix STRING;
SET table_size = (SELECT size_bytes FROM {{ this.dataset }}.__TABLES__ WHERE table_id='{{ this.table }}');
IF table_size > 0 THEN SET max_table_suffix = (select MAX(max_table_suffix) FROM {{ this }});
ELSE SET max_date = '{{ var("start_date") }}';
END IF;
{% endcall %}
-- Allows for using max_table_suffix to filter source data.
-- Example usage:
SELECT
        *
FROM {{ source('ga4_export', 'events') }}
{% if is_incremental() %}
    WHERE _table_suffix > max_table_suffix
{% endif %}


We hope you found this blog useful. If you happen to use any of our solutions or come across any strange quirks yourself, we’d be keen to hear more!

To find out how Lynchpin can support you with data transformation, data pipelines, or any other measurement challenges, please visit our links below or reach out to a member of our team.

To find out how Lynchpin can help

VIEW OUR CAPABILITIESVIEW OUR SERVICESCONTACT US

The post Working with dbt & BigQuery: Some issues we encountered and their solutions appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) ‘(not set)’ landing page values in GA4: Explained 13 Jun 2024 3:42 AM (last year)

In the latest episode of our ‘Calling Kevin’ video series, our Senior Data Consultant, Kevin, tackles a common issue many users face in Google Analytics: an increase to the number of ‘(not set)’ landing page values in GA4.

In the video below, Kevin covers:

  • How session timeouts lead to ‘(not set)’ values
  • How automatic event collection affects landing page data.
  • Tips for checking and fixing your GA4 setup.

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) ‘(not set)’ landing page values in GA4: Explained appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

(Video) Breaking down user metrics in GA4 23 May 2024 3:20 AM (last year)

In the latest from our ‘Calling Kevin’ video series, our Senior Data Consultant, Kevin, covers a few commons questions about user metrics in GA4.

In this quick walkthrough, Kevin dishes on the definitions and differences between the user metrics available in GA4 – some of which may be familiar to those experienced with Universal Analytics, while some changes are exclusive to GA4, causing some confusion for Google Analytics users both new and old.

  • How are Users defined in standard reports?
  • What is an Active user in GA4?
  • Are Users and Total users the same?

For more quick GA4 tips, be sure to check out other videos from our ‘Calling Kevin’ series.

The post (Video) Breaking down user metrics in GA4 appeared first on Lynchpin.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?