In the online world, “availability” is not an abstract metric but a hard dashboard of profit and reputation. Uptime shows what percentage of the time a site or service actually worked and was available to visitors. The closer to 100%, the less often customers encounter errors, carts don’t “fall over,” payment pages open quickly, and support isn’t on fire every evening. When you see “99%” in a provider’s offer, it sounds impressive; however, in reality it means almost 7 hours of downtime per month — and the business impact of those hours can be noticeable even for small projects.
What is uptime and how is it calculated
Uptime is the ratio of failure-free operating time to total time for a period:
Uptime = (Uptime ÷ Total time) × 100%.
In practice, different statuses are taken into account: full outage, partial degradation (for example, the API is unavailable but the site opens), planned maintenance, and force majeure. It is important to understand the calculation methodology: some providers exclude planned downtime from the calculation, others include it; some start the downtime timer only after N minutes of continuous unavailability. These are nuances hidden in the SLA that affect the real percentage.
SLA math: why “99%” is a lot of downtime
Let’s convert percentages into understandable minutes and hours within a 30-day month:
99% ≈ 7 h 18 m of downtime per month and ≈ 3 days 15 h per year;
99.9% ≈ 43 m per month and ≈ 8 h 45 m per year;
99.99% ≈ 4–5 m per month and ≈ 52 m per year.
The difference between 99% and 99.9% is a chasm in lost opportunities. For an online store with an average revenue of 5,000 UAH/hour, 7 hours of downtime is about ~35,000 UAH in direct revenue loss plus indirect losses due to checkout failures, cancellations, and reduced trust.
Impact on business, SEO, and reputation
Availability is a component of user experience and therefore of conversion. If the payment form “spins” without response, the customer leaves. Search engines account for stability: when a crawler receives errors several times in a row, pages lose positions, and recovery requires time and an ad budget. Partners and integrations (ERP, CRM, payment gateways) also expect predictability; every “window” leads to reporting disruptions and SLA penalties already on your side.
Not just percentages: MTTR, MTBF, and SLO
Even perfect percentages won’t save you if MTTR (Mean Time To Repair) — the average recovery time — is large. Two providers with the same 99.9% can feel different: one has short incidents that are closed quickly, the other has rare but many-hour incidents. MTBF (mean time between failures) and SLO (service level objective inside the company) help plan reliability and availability so that the numbers aren’t cosmetic.
SLA: what must be specified
Guaranteed percentage and calculation period (month/quarter).
Targets for MTTR and time to first response (for example, ≤15 minutes).
Compensation table for violating target values.
Clause about exclusions (force majeure, DDoS beyond filtering).
Escalation channels and incident reporting.
If you rent VPS or use server placement, carefully compare the SLA text itself rather than the marketing banner.
High-availability architectures: how to actually raise uptime
Geo-distribution: active-active or active-passive between sites, DB replication, automatic failover.
Balancing and self-healing: health checks, removing “sick” instances, autoscaling.
CDN and edge caching: reducing load on origin and local timeouts instead of a total crash.
Decomposition: isolating critical services so that a local degradation doesn’t bring everything down.
Secret storage and TLS: centralized key rotation, HSTS, OCSP stapling, timely renewal of SSL certificates so you don’t “drop” the site with an expired cert.
Autonomy from a single provider/carrier: redundant uplinks, BGP multihoming (for large ones).
Mini loss calculator and investment priorities
Simple estimate: Loss = Downtime (hours) × Average revenue/hour + SLA penalties + reputational losses.
Compare: increasing infrastructure cost by 10–20% is often cheaper than one or two serious failures during peak hours. In the project plan, budget for redundancy of critical nodes and regular DR drills — they pay off.
30-day plan: how to quickly raise actual uptime
Week 1. SLA audit, dependency inventory, enabling external monitoring, checking TLS expiration dates.
Week 2. Setting alerts and escalations, speeding up caching, enabling CDN for static assets.
Week 3. DB replication and a standby backend (at least passive), stage failover test.
Week 4. DR drill, fixing RPO/RTO, updating runbooks, adjusting on-call.
In parallel — agreements with the provider on maintenance windows and emergency communication channels. If turnover grows, consider switching to a more powerful VPS plan or geo-distributed server placement across several racks.
Conclusion
The “99%” figure looks solid until you convert it into real hours of downtime. To save on infrastructure, companies often overpay with lost orders, fallen SEO, and tired customers. Focus on three pillars: fault-tolerant architecture, operational readiness (MTTR, monitoring, on-call), and a transparent SLA with clear compensations. Then marketing percentages will turn into predictable availability, and availability — into stable revenue and growth.
Leave a Reply