# Fleack — full content feed Generated at 2026-04-23T10:59:01.711Z. Source of truth: https://fleack.io. See also https://fleack.io/llms.txt for the navigation-level summary. --- # Firebase Remote Config Alternatives for Mobile A/B Testing in 2026 URL: https://fleack.io/blog/firebase-remote-config-alternatives-mobile-ab-testing Author: Ayrton Lecoutre Published: 2026-04-19 Description: Six honest alternatives to Firebase Remote Config for mobile A/B testing — Statsig, LaunchDarkly, Optimizely, Flagsmith, Unleash, Fleack. The question isn't usually "is Firebase Remote Config broken?" — it works, it's free, and for basic parameter toggles it does the job. The question is why, after six or twelve months, mobile growth teams outgrow it. Take a mid-core mobile gaming studio running a weekend battle-pass promo. The team wants to test three interstitial-ad frequencies across high-ARPU and low-ARPU segments, and promote the winner before Monday. With Firebase Remote Config that means three new typed parameters, one new Firebase Audience that refreshes every 24 hours, and — if any parameter shape changed — an SDK update shipped through App Store review. By Monday the sale is over and the test never ran. Four reasons this pattern repeats, and how the six real alternatives each solve one of them. ## Why Firebase Remote Config feels limited past 10 tests a month ### Is Firebase's statistics engine good enough for a 2–3% lift? No. Firebase A/B Testing was built on [Google Optimize](https://support.google.com/optimize/answer/12979939), which Google deprecated on **September 30, 2023**. The Firebase surface still runs, but no new stats primitives have shipped since. You get a lightweight Bayesian output — "variant B is 62% likely to beat A" — and no sequential testing, no CUPED variance reduction, no multivariate, no false-discovery-rate correction across multiple metrics. To detect a 2–3% conversion lift reliably on that engine, you'd need to run for weeks. By then product has moved on. ### Why does every Firebase Remote Config change need a developer? Because Firebase treats parameters as typed contracts with your app binary. Adding a new parameter means a new SDK build; changing a parameter's type means a migration plus a release plus a store review. At one test per quarter, that's fine. At one test per week, the Jira backlog becomes the experimentation strategy. Apple publishes that [~50% of App Store reviews complete within 24 hours and 90% within 48](https://developer.apple.com/app-store/review/). Most of the time that's survivable. Edge cases with new capabilities stretch to 3–7 days — and "most of the time" isn't a business plan. ### Can Firebase Remote Config segment users at scale? Only on predefined dimensions — country, platform, app version, Firebase Audience. Anything dynamic (ARPU bucket, player level, days since install, LTV tier) has to flow through BigQuery, get exported into a Firebase Audience, and refresh every 24 hours. For the weekend battle-pass example above, the segment "high-ARPU Android players in Tier-1 countries, days-since-install ≥ 7" is live on Monday morning — after the sale. ### Is Firebase Remote Config really mobile-first for experimentation? Remote Config itself is mobile-first. A/B Testing on top inherits the SDK-update tax: you can't test a new parameter shape until old app versions have adopted it. On iOS that's days. On Android with lazy auto-updaters, it's weeks. For a mobile-first team that's the opposite of "mobile-first". If any of these hurt, there are better options. Here they are, ordered by the problem they actually solve. ## The 6 alternatives worth knowing in 2026 ### 1. Statsig — for teams that want real stats **Best for:** teams shipping 20+ experiments a month that need trustworthy decisions, not tea-leaf reading. **What it does well.** Statsig ships a serious experimentation engine: sequential testing, CUPED variance reduction, holdouts, pre-exposure checks, proper multi-metric FDR correction. Their "dynamic config" primitive is typed remote config with the same stats layer bolted on. Mobile SDKs for iOS, Android, Unity, Flutter and React Native are all first-class. Free tier covers 1M events/month. **Honest limitations.** Pricing scales fast on active mobile apps, which generate events freely. The UI is power-user-heavy — your marketers will not self-serve without a data partner. And like Firebase, every new parameter still requires an SDK that ships with your binary. **Pick Statsig if** you have a data scientist who will own the experimentation system and an engineering team that's fine absorbing SDK updates. For more on how statsig stacks up against LaunchDarkly and Firebase on mobile, see our [head-to-head comparison](/blog/statsig-vs-launchdarkly-vs-firebase-which-one-for-mobile). ### 2. LaunchDarkly — for enterprise feature-flag governance **Best for:** multi-platform companies where "deploy decoupled from release" is a board-level policy and compliance matters. **What it does well.** LaunchDarkly is the reference for feature flags. Best-in-class SDK reliability, proper flag governance (approval workflows, audit trails, stale-flag cleanup), enterprise SSO, SOC 2 Type II, HIPAA. The targeting engine is excellent. **Honest limitations.** The experimentation layer bolted on in 2022 is functional, not best-in-class — if stats is your primary need, Statsig or Optimizely beat it. Pricing starts in the low four figures per month for small teams and scales with MAU; at 2M MAU running 50+ tests a month, the annual bill is six figures. See our [LaunchDarkly alternatives guide for marketing teams](/blog/launchdarkly-alternatives-marketing-teams-2026) if you've already priced them out. **Pick LaunchDarkly if** your engineering leadership wants flags-as-infrastructure, you can absorb the cost, and experimentation is secondary to operational safety. ### 3. Optimizely Feature Experimentation — for stats-first web-leaning teams **Best for:** shops that grew up on Optimizely X and want the same methodology on mobile. **What it does well.** Best-in-class stats methodology, proper multivariate testing, a solid experimentation DSL. The mobile SDK hooks cleanly into feature flags. The web heritage means the stats story is airtight. **Honest limitations.** Optimizely's mobile support has been in maintenance mode for the last few years — new primitives ship on web first, mobile later, and some never make it. Pricing is quote-only above 1M MAU, which is where most real mobile teams live. If mobile is your primary surface, this is the wrong tool. **Pick Optimizely if** you have a web-first product and mobile is a secondary surface you want on the same stats engine. ### 4. Flagsmith — the open-source alternative **Best for:** teams that want LaunchDarkly-style feature flags without the LaunchDarkly bill. **What it does well.** Open-source under a [BSD-3-Clause licence](https://github.com/Flagsmith/flagsmith/blob/main/LICENSE). Self-hostable for effectively zero cost beyond infra; managed SaaS tier if you don't want to run it. Mobile SDKs for iOS, Android, Flutter, React Native. Feature flags plus segmentation plus a basic A/B testing layer. **Honest limitations.** A/B testing in Flagsmith is "check which variant was assigned" — not a stats engine. You wire up analytics and significance checks yourself. The community edition lacks some of the audit and approval controls that justify LaunchDarkly's price, which matters for regulated industries. If you're weighing whether a flag tool is the right A/B platform at all, start with [Feature Flags vs A/B Testing](/blog?cluster=feature-flags-alternatives). **Pick Flagsmith if** you have devops capacity to self-host and you mostly need flags and progressive rollouts rather than rigorous experimentation. ### 5. Unleash — another open-source route **Best for:** technical teams already deep in the Kubernetes / Prometheus / Grafana open-source world. **What it does well.** Apache 2.0-licensed, self-hosted-or-SaaS. Stronger developer-experience focus than Flagsmith, well-designed admin UI, excellent targeting engine, first-class mobile SDKs. **Honest limitations.** A/B testing is not the core pitch — it's "flags with metrics hooks" that you stitch into your existing analytics stack. If the experimentation narrative matters to your stakeholders, Unleash doesn't tell it. **Pick Unleash if** you like the Flagsmith story but prefer Unleash's DX and admin surface. ### 6. Fleack — for mobile teams whose bottleneck is dev throughput **Best for:** mobile gaming studios and growth teams where the blocker isn't the tool — it's waiting on engineering for every test idea. **What it does well.** Fleack sits between your mobile app and your backend as an edge layer — see [how it works](/#how-it-works) for the full picture. It watches your existing API traffic, uses AI to detect the parameters worth testing (prices, ad frequencies, interstitial cadence, reward multipliers, onboarding flows), and lets marketers launch A/B tests on them without an SDK, without a rebuild, and without a store review. The backoffice is built for non-technical users: AI-assisted test creation from a plain-English prompt, natural-language segments, multi-metric results with Bayesian confidence, AI-generated narratives on outcomes. The [hidden cost of SDK-based platforms](/blog?cluster=feature-flags-alternatives) — the dev-loop tax — is what Fleack is built to remove. **Honest limitations.** Fleack is a new category, not a drop-in Firebase replacement. It works on parameters exposed in API responses — if your app reads everything from static bundles committed at build time, Fleack has nothing to work with. The team is small; this isn't Series-D infrastructure yet. And if you genuinely need client-side feature flags with millisecond evaluation offline, a traditional SDK remains the right answer. **Pick Fleack if** you test mobile parameters weekly, engineering time is your scarce resource, and your app already fetches configuration from a backend. ## If you need X, pick X. If your real problem is the dev loop, none of the above. Here's the short version, straight out of how mobile teams actually decide: - **If you need best-in-class stats** (sequential testing, CUPED, multi-metric), pick **Statsig**. - **If you need enterprise feature-flag governance** (SOC 2, HIPAA, audit trails), pick **LaunchDarkly**. - **If you want a web-first stats pedigree** and mobile is secondary, pick **Optimizely**. - **If you need open-source self-host**, pick **Flagsmith** or **Unleash**. But if your actual problem is that marketers on a mobile team can't run tests without an engineer shipping a new build — *none of them are built for that*. They all assume "experiment = SDK + release + store review" is fine. For a weekly-testing mobile studio, that assumption is the whole problem. That's the gap Fleack fills. It's not a better Firebase Remote Config. It's a different axis: [marketer-led experimentation](/blog/marketer-led-experimentation-why-your-growth-team-shouldnt-wait-on-engineering) on live API traffic, no SDK involved. ## Quick-look comparison | Tool | Stats engine | Mobile-first | No-SDK | Licence / starting price | | ---- | ------------ | ------------ | ------ | ------------------------ | | Firebase Remote Config | Light (ex-Optimize) | Yes | No | Free | | Statsig | Strong | Yes | No | Free tier, event-based pricing | | LaunchDarkly | Medium | Yes | No | Low four figures/mo small team | | Optimizely Feature Experimentation | Strong | Web-leaning | No | Quote | | Flagsmith | Light | Yes | No | BSD-3-Clause, free self-host | | Unleash | Light | Yes | No | Apache 2.0, free self-host | | Fleack | Bayesian, multi-metric | Yes | **Yes** | Free tier | ## Where Firebase Remote Config still wins One honest point. If you're a small team already on the Google stack (Analytics, BigQuery, Crashlytics), running 1–2 tests per quarter, Firebase Remote Config is fine. Free tier is generous, integration cost is zero, experience is coherent. The inflection point is around 5–10 tests per month. Past it, the dev-loop tax exceeds what you'd pay for a dedicated platform. Past 20 tests per month, the SDK rebuild cadence itself is the bottleneck — and that's when teams stop looking for "a better Firebase" and start looking for [something that doesn't need an SDK at all](/#features). Count the tests you actually shipped in the last 90 days — not the ones you wanted to ship. If the number is under 4, the problem isn't Firebase; it's the ideas-to-variant pipeline. Migrating tools won't fix it. Work on that first. ## The question Firebase doesn't ask Every tool in this list — Firebase included — asks the same implicit question: "how can we give developers better experimentation primitives?" It's the right question for a 2015 stack. The 2026 question, at least for mobile growth teams, is different: "why do developers need to be involved at all for a marketer to change an ad frequency?" If that question resonates, the natural next read is our take on [marketer-led experimentation](/blog/marketer-led-experimentation-why-your-growth-team-shouldnt-wait-on-engineering). If it doesn't — pick the tool above that matches your actual constraint and ship. --- # LaunchDarkly Alternatives for Marketing Teams in 2026 URL: https://fleack.io/blog/launchdarkly-alternatives-marketing-teams-2026 Author: Ayrton Lecoutre Published: 2026-04-19 Description: LaunchDarkly is a dev tool sold as self-serve. Six honest alternatives for marketing teams that actually need to ship experiments without a Jira ticket. LaunchDarkly is a great product — for engineers. The problem starts when the company bought it on the promise of "self-serve experimentation" and marketing opens the backoffice. Take a mid-size mobile gaming studio that signed a six-figure LaunchDarkly contract last year. The growth manager wants to test a 7-day paywall discount for players whose ARPU dropped below the median. Opening LaunchDarkly: a flag tree organized by service, environment toggles, targeting rules written in "User.attribute.lowerARPUSegment == true", Bayesian test module locked behind a higher plan, and — critically — the "new flag" button still creates a typed parameter that engineering has to reference in a new SDK release. By week three, growth has given up and filed a Jira ticket. Here's the honest landscape. Six alternatives, ordered by the bottleneck they solve for marketing teams. ## Why marketing teams look for LaunchDarkly alternatives ### Is LaunchDarkly actually usable by a marketer? For governance workflows — approvals, audit trails, stale-flag cleanup — yes, and it's best-in-class at it. For defining a test on Tuesday and shipping it on Wednesday without a dev in the loop — no, because LaunchDarkly treats every flag as a typed contract with your client SDK. A new flag means an SDK change means a release. LaunchDarkly doesn't hide this. They're explicit that experimentation sits on top of feature flags, and feature flags are developer-owned. Their experimentation add-on, launched in 2022, is billed per-experiment and requires the same SDK integration. ### What does LaunchDarkly actually cost at mobile-app scale? LaunchDarkly's Foundation plan starts at approximately $10–20 per seat per month with a 25-seat minimum on annual billing. The Enterprise tier adds SOC 2 Type II, HIPAA, role-based approvals, and advanced audit logging — all quote-only. At 2M monthly active users with experimentation turned on, teams consistently report annual bills in the six figures. Public reviews on [TrustRadius](https://www.trustradius.com/products/launchdarkly/reviews) and [G2](https://www.g2.com/products/launchdarkly/reviews) are consistent on this. For a growth team shipping 20 tests a month, that's a high bill to pay for tooling your marketers still can't self-serve. ### Can marketing self-serve within LaunchDarkly if we buy enough seats? Partially. The UI allows flag toggles and targeting-rule edits. What it doesn't allow is: - Creating a *new* parameter without engineering shipping it client-side - Changing a parameter's *shape* (string → object, scalar → list) without a release - Running a meaningful experiment without the experimentation add-on and an SDK upgrade The consistent pattern: a flag that already exists becomes a marketer-facing lever. A flag that doesn't yet exist is a dev ticket. ## The 6 alternatives worth knowing in 2026 ### 1. Split (now Harness Feature Flags) — the closest LaunchDarkly clone **Best for:** teams that want LaunchDarkly's governance model at a slightly better price point. **What it does well.** Feature-flag governance, environment promotion, targeting rules, experimentation on top. Since Harness's 2023 acquisition, tight integration with the Harness CI/CD platform. **Honest limitations.** Same marketer-usability problem as LaunchDarkly — it's a developer tool with a marketing veneer. Pricing is LaunchDarkly-adjacent at scale. If the reason you're leaving LD is "too expensive", Split is a 20% discount, not a category change. If the reason is "marketing can't use it", Split won't fix that. **Pick Split if** LaunchDarkly's contract is up for renewal and you want a negotiation lever, not a new paradigm. ### 2. Statsig — experimentation-first, still dev-oriented **Best for:** teams where a data scientist owns experimentation and marketers get a read-only view. **What it does well.** Serious stats engine — sequential testing, CUPED, holdouts, pre-exposure checks, proper multi-metric FDR correction. Free tier covers 1M events/month. We go deeper on Statsig in our [Firebase Remote Config alternatives](/blog/firebase-remote-config-alternatives-mobile-ab-testing) breakdown. **Honest limitations.** The UI is power-user-heavy. Your marketers will not self-serve. Every new dynamic config parameter still requires an SDK integration. If your growth team is the primary user, Statsig solves stats but not access. **Pick Statsig if** you have the data science function and the engineering capacity to absorb SDK updates monthly. ### 3. Optimizely Feature Experimentation — the marketer-legacy choice **Best for:** shops that grew up on Optimizely X Web and want to keep marketing in the driver's seat. **What it does well.** Optimizely's web product was marketer-native for a decade — the visual editor, test queue, stakeholder reviews are all there. Best-in-class stats methodology. The feature-experimentation platform retains that heritage. **Honest limitations.** Mobile support has been in maintenance mode for ~2 years. New primitives ship on web first, mobile late or never. Pricing is quote-only above 1M MAU. If mobile is your primary surface, the marketer-friendly UX is behind a platform that doesn't prioritize your surface anymore. **Pick Optimizely if** web is your dominant channel and mobile is secondary. ### 4. AB Tasty — marketer-first, enterprise-sold **Best for:** enterprise marketing teams with a multi-year contract and an onboarding team. **What it does well.** Built for marketers from day one. Visual editor on web, decent mobile SDK story, personalization and widget engine on top. Strong European presence. **Honest limitations.** Classic enterprise sales motion — quote-only pricing, multi-year contracts, mid-six-figure entry points. Mobile experimentation is SDK-based and inherits the same "new parameter = new release" tax as LaunchDarkly. The marketer-friendly UX doesn't close the gap on mobile. **Pick AB Tasty if** you're already in the RFP stage for enterprise marketing tooling and mobile is a secondary surface. ### 5. Kameleoon — European, AI-first in 2026 **Best for:** web-heavy European teams that want AI-assisted test creation. **What it does well.** Marketer-native UX, strong personalization, and in late 2025 launched PBX (Prompt-Based Experimentation) — a natural-language test creator that converts "test a 10% discount for users who bounced last month" into a live experiment on web. GDPR-native posture appreciated in regulated industries. **Honest limitations.** PBX is web-only at the time of writing. Mobile support exists via SDK and lags the web product. Pricing is quote-only mid-market and up. **Pick Kameleoon if** you're primarily web, operate in Europe, and the AI-assisted authoring story matters. ### 6. ConfigCat — the budget-friendly flag tool **Best for:** small teams that want feature flags without a six-figure contract. **What it does well.** Flat, transparent pricing starting at about $99/month for production use, unlimited flags, 10 environments. Mobile SDKs for iOS, Android, Flutter, React Native. Simple admin UI that's less intimidating than LaunchDarkly's. **Honest limitations.** Experimentation is minimal — flag assignment with an export to your analytics tool. Not a stats engine. And like everything else on this list, new flags require an SDK ship. **Pick ConfigCat if** your real need is cheap, reliable feature flags for a small team and experimentation is a nice-to-have you'll wire up later. See our full breakdown on this tradeoff in [Feature Flags vs A/B Testing](/blog?cluster=feature-flags-alternatives). ### 7. Fleack — for marketing teams whose bottleneck is the dev loop **Best for:** mobile gaming studios and consumer-app growth teams where marketing wants to run 20+ tests a month and engineering can't absorb the request volume. **What it does well.** Fleack sits between your mobile app and your backend as an edge layer — see [how it works](/#how-it-works). It intercepts the API traffic, uses AI to detect the parameters worth testing (prices, ad frequencies, interstitial cadence, reward multipliers, onboarding flows), and lets marketers launch A/B tests on them without an SDK, without a rebuild, and without an App Store review. The UI is built for non-technical users: plain-English test creation, natural-language segments, multi-metric Bayesian results. Marketing self-serves, engineering stays out of the loop. **Honest limitations.** Not a feature-flag management tool — governance workflows, approval chains, stale-flag cleanup are intentionally out of scope. The [hidden cost of SDK-based experimentation platforms](/blog?cluster=feature-flags-alternatives) is what Fleack removes; if you *need* that SDK discipline, LaunchDarkly is still the right answer. And Fleack is early-stage — this isn't Series-D infrastructure. **Pick Fleack if** marketing is the intended user, mobile is the primary surface, and you've stopped pretending LaunchDarkly is "self-serve". ## If you need X, pick X. If your real problem is marketing access, none of them. Straight out of how teams actually decide: - **If you need LaunchDarkly-style governance** (approval workflows, SOC 2, HIPAA): pick **Split** or stay on LaunchDarkly and negotiate. - **If you need serious stats**: pick **Statsig**. - **If you want Optimizely's marketer UX** and mobile is secondary: pick **Optimizely**. - **If you're an enterprise marketing org on web**: pick **AB Tasty** or **Kameleoon**. - **If you want cheap reliable flags**: pick **ConfigCat**. But if your actual problem is that marketing can't run a test on mobile without filing a Jira ticket — *none of them are built for that*. Every product on this list except Fleack assumes the experiment loop is "marketer → ticket → engineer → SDK update → release → store review → live". For a mobile growth team that ships weekly, that loop *is* the problem. That's the gap Fleack fills. See our take on [marketer-led experimentation](/blog/marketer-led-experimentation-why-your-growth-team-shouldnt-wait-on-engineering) for the category argument. Or keep reading for the head-to-head on the three names most often shortlisted alongside LaunchDarkly: [Statsig vs LaunchDarkly vs Firebase](/blog/statsig-vs-launchdarkly-vs-firebase-which-one-for-mobile). ## Quick-look comparison | Tool | Marketer-friendly UX | Mobile-first | No-SDK | Starting price | | ---- | -------------------- | ------------ | ------ | -------------- | | LaunchDarkly | Low | Yes | No | ~$10–20/seat, 25-seat min | | Split (Harness FME) | Low | Yes | No | Comparable to LaunchDarkly | | Statsig | Low | Yes | No | Free tier, event pricing | | Optimizely Feature Experimentation | Medium-high | Web-leaning | No | Quote | | AB Tasty | High | Yes | No | Enterprise quote | | Kameleoon | High | Web-leaning | No | Quote | | ConfigCat | Medium | Yes | No | ~$99/mo | | Fleack | **High** | **Yes** | **Yes** | Free tier | ## Where LaunchDarkly still wins One honest point. If you have a multi-service platform where feature flags are an engineering discipline — deploy decoupled from release, safe rollouts, progressive enable-by-region — LaunchDarkly's governance surface is *the* reference. Their SDKs are rock-solid. Their SOC 2 and HIPAA posture will satisfy any compliance team. If that's your primary use case, renew and don't look at this list. The question is whether "feature-flag governance" is actually what your team needs. For a marketing org running growth experiments on a mobile app, the answer is almost never yes. Run this check. Over the last 90 days, how many tests did marketing *launch themselves* in LaunchDarkly, with zero engineering involvement? If the number is near zero, LaunchDarkly is not what's blocking you — the dev loop is. A cheaper or prettier flag tool won't fix that. Neither will more seats. ## The question every alternative dodges Every tool in this list assumes one constant: that experiments need an SDK. That's true for feature flags (you need an SDK to evaluate them at runtime with millisecond latency). It's not true for most A/B tests on mobile, which are about varying values returned from your backend — values that already travel over the network and can be rewritten at the edge. If that resonates, Fleack is worth 30 minutes. If it doesn't — pick the tool above that matches your actual constraint and ship. --- # Marketer-Led Experimentation: Stop Waiting on Engineering URL: https://fleack.io/blog/marketer-led-experimentation-why-your-growth-team-shouldnt-wait-on-engineering Author: Ayrton Lecoutre Published: 2026-04-19 Description: Every A/B testing platform assumes experiments need engineers. On mobile, that assumption is the bottleneck — not stats, not price. A case for the category. Three weeks ago I watched a head of growth at a mid-core mobile gaming studio pitch an A/B test in a standup. The idea was good — a 7-day discount on the new battle pass for players whose ARPU had dropped below the median. The goal: recover 3% of churning whales before the winter content drop. The test launched nine weeks later. Two iOS releases in, one parameter-shape migration, one App Store review that stretched to four days because of a privacy manifest change, one Android rollout held up by a translator sign-off. By the time the variant was live, the content drop had shipped, the churning cohort had left, and the test measured a problem that had moved. The platform they use — one of the expensive ones you've heard of — has a great stats engine. That's not the part that failed. ## The two-word summary of why growth teams stall on mobile **The SDK.** Every established experimentation platform — LaunchDarkly, Statsig, Optimizely, Firebase, Split, Flagsmith, Unleash, ConfigCat — assumes the same mobile test loop: 1. Marketer proposes a test. 2. Engineer adds a typed parameter to the SDK integration. 3. New build. Release branch. QA. 4. App Store review (median ~24 hours, but edge cases stretch; Apple [publishes that 90% complete within 48](https://developer.apple.com/app-store/review/)). Play Store rollout staging. 5. Wait for user adoption of the new client version. 6. Test goes live — weeks after the idea. That loop is fine at one test per quarter. It breaks at one test per week. And for a mobile growth team trying to catch a weekend event, it's unshippable. ## The "self-serve experimentation" claim keeps failing for a reason Go read any feature-flag vendor's website. Marketer self-serve is on the homepage. It's been on the homepage for eight years. The claim isn't a lie. It's technically correct: a marketer *can* toggle a flag that already exists, edit targeting rules on that flag, and see reporting. What they can't do is: - Create a *new* parameter type without engineering shipping it into the SDK. - Change a parameter's shape (string → object, scalar → list) without a release. - Run an experiment on a parameter that hasn't been instrumented on the client. - Test anything that the engineering team didn't anticipate six months ago when they wired the integration. That's why LaunchDarkly seats don't turn into throughput. It's not a training issue. It's not a UI issue. It's a *product architecture* issue: the SDK is the contract, and the contract is engineering-owned. We covered the marketer-access gap across named vendors in [LaunchDarkly alternatives for marketing teams](/blog/launchdarkly-alternatives-marketing-teams-2026) and in the three-way [Statsig vs LaunchDarkly vs Firebase](/blog/statsig-vs-launchdarkly-vs-firebase-which-one-for-mobile) head-to-head. Same pattern every time. ## What "marketer-led" actually has to mean Marketer-led experimentation is not a process goal. It's a product requirement. Specifically: ### 1. A marketer can ship a test end-to-end without an engineer Not "without training engineering on the new platform once". Without an engineer in the loop of *that specific test*. Zero Jira tickets. Zero SDK changes. Zero release coordination. ### 2. The test reaches end users on whatever app version they're running Not "users on the latest version pick up the test". Every user on every shipped version, immediately, the moment the test is configured. ### 3. The parameter space is defined by the app's behaviour, not by pre-instrumented hooks Not "here are the 40 parameters engineering wired up in Q2; pick one". The parameter space is whatever the app asks from the backend — prices, ad frequencies, reward tables, copy, UI text, feature toggles, onboarding flow order, anything that travels over the network. ### 4. Stats quality doesn't regress "No engineers" shouldn't mean "no CUPED, no sequential testing, no proper multi-metric FDR correction". Those are table stakes. Marketer-led doesn't mean amateur-hour. ### 5. Engineering is still in control of what's tolerated Not "marketing runs wild on production". The backend team defines what's safe to vary, the test harness enforces it, and rollback is a button. Marketing self-serves within a sandbox engineering established once, not per-test. Any "marketer-led experimentation" product that doesn't hit all five is a marketing claim, not a product. ## Why this wasn't possible until now Three things had to converge. **Edge compute got cheap.** Ten years ago, intercepting every API call between a mobile app and its backend at line-rate was a six-figure infrastructure project. Today it's a Cloudflare Worker or equivalent, ~40ms p95 latency, priced per request. **LLMs got good at parameter identification.** Knowing which fields in a JSON response are "parameters worth testing" (price, cadence, copy) vs "structural data" (user ID, metadata) used to require a human analyst scanning the response. GPT-class models do it reliably in seconds, including edge cases like nested arrays and localised content. **Backend APIs became the source of truth.** Ten years ago, mobile apps bundled their config at build time. Today, basically every app of consequence fetches configuration, pricing, and content from a backend on launch — often on every session. Those fetches are the leverage point. Put those three together and the SDK stops being the only way to run A/B tests on mobile. You intercept the backend response at the edge, identify the parameters, rewrite them per-variant based on the user segment, and log exposures. No client code changes, no SDK, no store review. That's the architecture Fleack runs. See [how it works](/#how-it-works) for the details. ## The honest case against marketer-led experimentation This is where dev-first tools still win, and I'd be lying to pretend otherwise. **Compliance-heavy regulated industries.** If every flag change needs a four-eyes approval with audit trail, SOC 2 Type II audit logging, and role-based access to specific environment tiers, LaunchDarkly is the right answer. Marketer-led means fewer gates, and fewer gates means more incidents in regulated domains. **Offline-first apps.** If your app has to run A/B-tested behaviour with no network connection, the variant has to be resident on the client. That needs an SDK evaluating a cached flag ruleset. No edge-layer interception works when the edge doesn't exist. **Feature *flags* specifically.** Feature flags are not A/B tests. Governance workflows — staging rollouts, stale-flag cleanup, feature-level kill switches — are a legitimate engineering discipline and belong in engineering-owned infrastructure. We unpack the distinction in [Feature Flags vs A/B Testing: Are They the Same Thing?](/blog?cluster=feature-flags-alternatives). **Stats research.** If your experimentation function is three data scientists shipping methodology papers, Statsig is the right home. The tooling for sequential tests, CUPED holdouts and multi-treatment orthogonal designs is built for that audience. If you recognise your team in the above, keep your current stack. This article is not for you. It is for the 80% of mobile growth teams who don't have any of those constraints, who just want to test a price, a frequency, a reward table — and who have had "this is blocked on engineering" in their weekly standup for two years. ## What you lose, what you gain | What you keep from dev-first tools | What marketer-led changes | | ---------------------------------- | ------------------------- | | Stats quality (Bayesian, multi-metric) | No more SDK updates per test | | Segmentation on dynamic attributes | No more App Store review waits | | Audit of who changed what | No more Jira queue on the growth roadmap | | Rollback to control | A much shorter feedback loop | You do *not* lose rigour. You lose the dev loop. ## What this looks like in practice A typical week for a growth team on a marketer-led platform: **Monday.** Head of growth opens the backoffice, asks the AI copilot "test three interstitial frequencies on players below the median ARPU, run for seven days, measure impact on D7 revenue". The copilot proposes a test: control (current frequency), variant A (higher frequency, higher eCPM risk), variant B (lower frequency, retention hedge). Segment: "ARPU 7-day trailing below median, platform is iOS or Android, country in Tier 1". Variants are live in two minutes. **Thursday.** Growth reviews the exposure count (2 400 users exposed, healthy sample), sees variant A lifting D2 revenue +4.1% with 87% Bayesian confidence, variant B flat. Decides to let it run through the weekend. **Monday following.** 94% confidence on variant A. One click promotes it to 100% of the segment. Engineering was never in the loop. Nobody shipped code. The App Store didn't know. The tool is the difference. Not the process. Not the training. Not the stakeholder buy-in. ## The part where I admit this is self-interested I build Fleack. Fleack is an A/B testing platform that runs on edge interception of API traffic for mobile apps. No SDK, no rebuild, no store review. It's designed specifically to make marketer-led experimentation real for mobile teams. Category pitches by category creators deserve scepticism. Two things to know: First, I'd have written this article the same way if I ran any other edge-based experimentation tool. The argument is about the *category*, not the vendor. If you read this and go shop around, please do. There are going to be more of us by 2027. Second, Fleack is early. The data-science-grade stats engine is there. The marketer UI is there. The AI copilot for test creation is there. The governance and compliance surface that a regulated enterprise needs is *not* there yet — if you're in that bucket, use LaunchDarkly and talk to us in 2027. For everyone else on a mobile growth team: the three-week standup gap between idea and live test is not a law of nature. It's a choice your tooling is making for you. ## One question to run on your team today Pull the last 10 tests marketing proposed. For each one, count days from "I want to test X" to "X is live for users". Now count how many of those days were spent on engineering work that was specific to *that test* — not shared setup that would have been paid anyway. If the answer is more than one day per test, the rest is the SDK tax. And the SDK tax isn't a discount-negotiation problem or a better-UI problem. It's a "do we still need an SDK for this?" problem. If you're ready to ask that question seriously, [see the part where Fleack removes the tax](/#features). If you're not, no judgement — but your next three roadmap quarters will look a lot like the last three. "We didn't realise how much of our 'experimentation strategy' was really 'what engineering had capacity to instrument' until we stopped needing them. Six months later we're running 30 tests a month with the same marketing headcount and the same engineering team — just not on the same queue." ## Further reading - [Firebase Remote Config alternatives for mobile A/B testing](/blog/firebase-remote-config-alternatives-mobile-ab-testing) — honest breakdown of what Firebase and its dev-first replacements actually cost. - [LaunchDarkly alternatives for marketing teams](/blog/launchdarkly-alternatives-marketing-teams-2026) — the marketer-access gap, vendor by vendor. - [Statsig vs LaunchDarkly vs Firebase: which one for mobile](/blog/statsig-vs-launchdarkly-vs-firebase-which-one-for-mobile) — the three-way head-to-head, and the fourth question it dodges. - [The hidden cost of SDK-based experimentation platforms](/blog?cluster=feature-flags-alternatives) — put a number on the dev-loop tax. --- # Statsig vs LaunchDarkly vs Firebase: Which One for Mobile in 2026? URL: https://fleack.io/blog/statsig-vs-launchdarkly-vs-firebase-which-one-for-mobile Author: Ayrton Lecoutre Published: 2026-04-19 Description: Head-to-head on stats, mobile support, marketer UX, pricing and the dev-loop tax. Honest verdict per dimension, plus when none of them are the right answer. You've narrowed it to three. Your data scientist wants Statsig. Your head of platform wants LaunchDarkly. Your CFO likes that Firebase is free. Your head of growth just wants to ship a battle-pass pricing test before Monday. Here's the honest head-to-head, scored on the dimensions that actually matter for a mobile team in 2026. ## What each one is actually for (30-second version) ### Statsig: a stats engine with feature flags bolted on Founded 2021 by ex-Facebook experimentation. Full stats stack — sequential testing, CUPED variance reduction, holdouts, pre-exposure checks, multi-metric FDR correction. Dynamic config is the primitive that gets confused with feature flags, but it's typed remote config evaluated client-side. ### LaunchDarkly: a feature-flag governance system with experiments bolted on Founded 2014, the reference for feature flags as infrastructure. Best-in-class SDK reliability, approval workflows, audit trails, stale-flag cleanup, enterprise compliance (SOC 2 Type II, HIPAA). Experimentation was added in 2022 — functional, not best-in-class. ### Firebase A/B Testing: a free remote-config layer that happens to run tests Part of the Firebase suite, tight integration with Analytics, BigQuery, Crashlytics. Built on [Google Optimize](https://support.google.com/optimize/answer/12979939), which Google deprecated on September 30, 2023 — the Firebase surface still runs but no new stats primitives have shipped since. Our deeper take: [Firebase Remote Config alternatives](/blog/firebase-remote-config-alternatives-mobile-ab-testing). ## Head-to-head on the dimensions that matter ### Which has the best stats engine? **Verdict: Statsig, by a significant margin.** LaunchDarkly a distant second, Firebase a distant third. Statsig ships what an experimentation team actually wants: sequential testing (stop early when significant, no peeking penalty), CUPED (pre-experiment covariate adjustment for 20–50% variance reduction), holdouts, pre-exposure bias checks, and proper FDR correction when tracking multiple metrics. The methodology is public in their [engineering blog](https://www.statsig.com/blog). LaunchDarkly's experimentation layer covers the basics — frequentist A/B with p-value reporting, basic Bayesian output — but lacks CUPED and sequential testing at most tiers. For a 2–3% lift, you'll run longer than you need to. Firebase gives you "variant B is 62% likely to beat A" and not much more. Single-metric, lightweight Bayesian, no sequential, no multivariate. Enough for hobby projects, not enough to ship pricing decisions. ### Which handles mobile best? **Verdict: Statsig and Firebase tie on SDK quality. LaunchDarkly third.** All three have first-class iOS, Android and cross-platform (Unity / Flutter / React Native) SDKs. The differences are at the edges. Statsig's mobile SDKs get the same methodology primitives as their server SDKs — you can run a CUPED-corrected test entirely on client-side assignment. Firebase's SDKs are mobile-first by heritage. LaunchDarkly's SDKs are *excellent* but their experimentation layer was built for web server-side evaluation first; mobile experimentation works but wasn't the design centre. Common ground: all three require the mobile SDK shipped with your binary. **New parameter = new release.** That's the tax none of them removes. ### Which lets marketers self-serve? **Verdict: None of them, and don't let the sales pitch tell you otherwise.** All three are developer-first tools. The UIs are indexed by flag trees, environment toggles, evaluation rules. Marketers can *read* and edit existing flags. They cannot: - Create a new parameter type - Change a parameter shape - Run a meaningful experiment on a parameter engineering hasn't wired up yet This is the marketer-facing usability pattern covered in depth in our [LaunchDarkly alternatives guide](/blog/launchdarkly-alternatives-marketing-teams-2026) — the "self-serve" framing overpromises on every dev-first flag platform, LaunchDarkly included. ### Which has real segmentation flexibility? **Verdict: Statsig.** Statsig targets arbitrary user attributes client-side — whatever object you pass to their SDK, you can target on it. ARPU bucket, player level, days since install, LTV tier — all first-class. LaunchDarkly supports the same concept with custom attributes, but the targeting UI feels like writing rules for a proxy ACL, which is what it is. Firebase targets on predefined dimensions (country, platform, version, Firebase Audience). Anything dynamic has to flow through BigQuery, get exported into a Firebase Audience, and refresh every 24 hours. For time-sensitive segments, that lag is fatal. ### What do they cost at 2M MAU and 20 tests/month? **Verdict: highly dependent on event volume and headcount, and the "cheapest" answer is rarely the real cost.** Rough reality check (public sources, list pricing, your mileage will vary): - **Statsig**: Free tier covers 1M events/month. Above that, event-based pricing kicks in. An active mobile app at 2M MAU can easily do tens of millions of events per month, landing you in the low-five to low-six figures annual range depending on experiment volume. - **LaunchDarkly**: Foundation plan ~$10–20/seat/month with a 25-seat minimum on annual billing; Enterprise tier is quote-only. Experimentation is an add-on. At 2M MAU with experimentation on, teams [consistently report](https://www.trustradius.com/products/launchdarkly/reviews) six-figure annual bills. - **Firebase A/B Testing**: Free. But the real cost is measured in Jira tickets — at one test per quarter, free is actually free; at one per week, the SDK-rebuild tax eats any saving. ### Which requires an SDK? **Verdict: all three.** This is the common ground that the three-way debate usually skips. Every one of these products assumes the mobile test loop is: marketer asks → engineer adds a flag or config key → SDK release → App Store review → experiment live. On iOS, Apple publishes that [roughly 50% of App Store reviews complete within 24 hours and 90% within 48](https://developer.apple.com/app-store/review/); edge cases stretch to days. On Android, user auto-update timing adds weeks of tail. Multiply that by 20 tests a month and the SDK cadence is the throughput limit, not the platform's feature list. ## Full comparison matrix | Dimension | Statsig | LaunchDarkly | Firebase A/B Testing | | --------- | ------- | ------------ | -------------------- | | Stats engine | **Strong** (CUPED, sequential, FDR) | Medium | Light (ex-Optimize) | | Feature-flag governance | Light | **Strong** (SOC 2, HIPAA, approvals) | Light | | Mobile SDK quality | Strong | Strong | Strong | | Marketer self-serve | Low | Low | Low | | Segmentation flexibility | **Strong** (arbitrary attrs) | Medium | Light (predefined + BigQuery) | | Starting price | Free (1M events) | ~$10–20/seat, 25-seat min | Free | | No-SDK option | No | No | No | | Best for | Data-science-owned experiments | Enterprise flag governance | Small teams, ≤2 tests/quarter | ## Decision tree in 30 seconds ``` Start │ ├─ Is stats quality the primary decision driver? │ └─ Yes → Statsig │ ├─ Do you need flag governance (approvals, SOC 2, HIPAA audit)? │ └─ Yes → LaunchDarkly │ ├─ Are you running 1–2 tests per quarter and already deep in the Google stack? │ └─ Yes → Firebase, don't overthink it │ └─ Is the bottleneck marketers waiting on engineering? └─ None of the above. See below. ``` ## The question none of the three answers All three products share one implicit assumption: that mobile A/B testing needs a client SDK. That assumption is load-bearing — remove it and the whole dev-loop tax disappears. It's true for *feature flags*, where millisecond-latency evaluation matters and offline behaviour is critical. It's *not* true for most A/B tests on mobile, which test values returned from the backend — pricing, ad frequency, reward rates, copy, UI parameters. Those values already travel the network on every call. They can be rewritten at the edge without the client knowing. That's the axis [Fleack](/#how-it-works) sits on. Not a better Statsig, not a cheaper LaunchDarkly, not a Firebase replacement. A different question entirely: what if marketing could test mobile parameters on live API traffic without engineering shipping anything? If that resonates, the natural follow-up is our take on [marketer-led experimentation](/blog/marketer-led-experimentation-why-your-growth-team-shouldnt-wait-on-engineering) — the category that the three-way debate was too narrow to admit existed. Also worth reading: [the hidden cost of SDK-based experimentation platforms](/blog?cluster=feature-flags-alternatives). ## When to still pick one of the three One honest pass at each: - **Pick Statsig** if your primary constraint is "trust the stats" and you have a data function that will own the platform. Their methodology is real; their UI expects a power user. - **Pick LaunchDarkly** if flag governance is a board-level concern — regulated industry, SOC 2/HIPAA audit trails, multi-service platform where deploy-decoupled-from-release is a policy. - **Pick Firebase A/B Testing** if you run 1–2 tests per quarter, you're already on Firebase, and "free" outweighs the stats limitations. Above 5 tests/month, the SDK tax exceeds what you'd pay for a dedicated tool — at which point you're back on this list. Before signing any contract here, count the tests marketing *actually shipped* in the last 90 days — not the ones they proposed. If it's under 4, the bottleneck isn't Statsig vs LaunchDarkly vs Firebase. It's the ideas-to-variant pipeline. Migrating tools without fixing that pipeline will disappoint on whatever you pick. ## One-line summary - **Statsig** is the data scientist's pick. - **LaunchDarkly** is the platform team's pick. - **Firebase** is the default-because-it's-free pick. - **None of them is the marketer's pick** — and on a mobile-first team, that's usually whose request is actually waiting in the backlog. If that sentence lands, [see how Fleack fits](/#features). If it doesn't, pick the tool above that matches your actual constraint and ship. ---