Read about our update schedules here.

Introduction

Reliable solutions operate effectively and dependably. Reliable architectures are available when and where users expect, perform consistently, and scale with the business.

Reliability is a representation of system quality. A system that is not error-prone, behaves as expected, and provides results in a timely manner is highly reliable. Conversely, a system that takes too long to complete tasks (at least from a user’s point of view), doesn’t do what users expect, or “errors out” at critical times is not reliable. Because unreliable systems can’t be counted on to provide an accurate view of the information stakeholders need to make key business decisions, they undermine a system’s ability to be trusted.

The reliability of a system is not constant. A system that’s reliable today may become unreliable in the future if it hasn’t been designed to grow and evolve with your organization. This can lead to costly maintenance and the need to refactor or completely re-implement a system, using funds that could have been spent on more strategic projects.

You can improve reliability in your Salesforce solutions by focusing on three key habits: availability, performance, and scalability.

Availability

Availability is a measure of the percentage of time that your system is operational. The Salesforce Platform handles most infrastructure-level availability concerns for you. However, the availability of the solutions you build on the platform, and experienced by your customers, is a shared responsibility. It’s important to understand that even with Salesforce’s commitment to high availability, the risk of system downtime is never zero.

Architects must prepare for Salesforce system downtime, whether from events like planned maintenance windows or unforeseen circumstances. In addition to the risks of service disruptions, you need to consider how your solutions will maintain high performance and grow with the business. Architectural choices that focus too narrowly on current requirements can lead to availability issues over time.

Think about availability during the design phase, before your solution is built. Even a single incident can cause stakeholders to lose trust and doubt the overall value of a system. The longer you defer architecting for availability, the higher the actual cost of availability issues will be in the long run. As an architect, you will need to use the language of the business, framing technical concerns in ways that make sense to business stakeholders to drive buy-in and alignment around prioritizing availability work.

You can architect for higher availability for your Salesforce solutions through Risk Management and failure mitigation.

Risk Management

Managing risks in the context of Salesforce architecture involves identifying potential hazards that could impact the operation of your system, users of your system (including employees, partners, and customers), and business processes. Often, the formal process of conducting risk analysis will fall under the responsibilities of project managers. As an architect, it is your responsibility to make sure any risk analysis adequately represents the concerns of both the technical and the business stakeholders relying on your solutions.

Some of the biggest pitfalls in risk management rise from simply not dedicating time and thought to the task. Far too often, teams skip risk assessment altogether. Or they conflate solving for backup and restore (an important part of mitigating risks to data integrity) with comprehensive risk assessment and mitigation.

To accurately assess risk for your Salesforce solutions, consider:

The list of patterns and anti-patterns below shows what proper (and poor) risk management looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools related to risk management, see Tools Relevant to Reliable.

Failure Mitigation

A failure point is any place in the system that can cause significant parts of your system to perform unexpectedly (or not at all) when a service disruption or problem occurs. In reality, almost any point in a system could turn into a failure point. Good mitigation isn’t about attempting to pinpoint every potential failure point in a system. It is instead establishing ways to quickly and accurately classify and prioritize failure points and responses to allow maintenance and support teams to respond effectively. (For more on this, see incident response.)

To develop better failure mitigation strategies, consider:

Trigger Classfication/Type Mitigation
People Policy
Process Playbooks, Continuity Plans
Technology Redundancy
Trigger Mitigation Basic Intermediate Mature
User access change (new employee, departing employee) SLA and requirements around provisioning/deprovisioning users Provisioning/deprovisioning policy is enforced manually, according to manual SLAs Scheduled jobs process user changes according to policy, according to scheduled SLAs User provisioning/deprovisioning is automated via SSO/IDM solution

The list of patterns and anti-patterns below shows what proper (and poor) failure mitigation looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about tools available from Salesforce for failure mitigation, see Tools Relevant to Reliable.

Availability Patterns and Anti-Patterns

The following table outlines patterns to look for in your Salesforce operations and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Risk Management Within your business:
- An established risk assessment framework is in use
- Risks are categorized into people, process, and technology areas
Within your business:
- The risk assessment framework for Salesforce is ad hoc
- Risk isn't clearly identified
In your documentation:
- Risk severity is categorized and assessed based on customer impact
- Risk mitigation and response plans are prioritized, focusing on highest priority risks first
In your documentation:
- The customer perspective isn't considered when assessing risk severity or category
- Risk mitigation and response plans try to capture every risk imaginable
Failure Mitigation In your org:
- Failure point triggers and their corresponding mitigation plans are categorized by people, process, and technology
- Mitigation controls are put in place immediately, mature over time, and incorporate automation as early as possible
In your org:
- Failure point triggers are not classified; mitigation approaches are ad hoc or non-existent
- Mitigation controls are not revisited or evolved
- Automation is not used in mitigation

Performance

Performance, in the context of system architecture, is a measure of a system’s overall processing capacity (throughput) and how fast it responds to requests and demands (latency). Typically, you derive an understanding about how your system performs through testing and production monitoring. A performant system completes processes within a timely manner, at any anticipated level of demand.

Poor performance goes hand-in-hand with higher latencies and lower throughputs, which lead to lower productivity and increased user frustration. Further, performance issues are often a matter of some urgency, and can lead to loss of trust among customers as well as financial losses.

You can improve the performance of your solutions by optimizing throughput and optimizing latency.

Note: Throughput and latency optimization are essential aspects of improving system processing and responsiveness. It’s important to remember, however, that overall system performance also depends on how well you architect for scale. You must consider both dimensions in your designs.

Throughput

In terms of Salesforce architecture, throughput is the number of concurrent requests a system can complete within a given time interval. Salesforce solutions that have been designed and optimized for throughput better operate within the built-in governor limits of the platform.

Optimizing throughput in Salesforce begins with accurately calculating workloads in your system and planning for growth. Without accurate projections for how what kinds of demands will be made on the system, you will be unable to pinpoint potential issues with the throughput capabilities of your system. There are three dimensions to consider when thinking about workloads:

When thinking about performance, there can be a tendency to focus too narrowly on compute and the constraints on maximum CPU time that are among the platform’s governor limits. Teams that maintain this narrow focus overlook many methods for optimizing throughput that are unrelated to raw processing power. By expanding your view and applying these methods (outlined below), you can improve the overall throughput and efficiency of your Salesforce architectures, which in turn will help reduce latency and increase overall system performance.

To optimize throughput in your system, consider:

The list of patterns and anti-patterns below shows what proper (and poor) throughput looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for throughput optimization, see Tools Relevant to Reliable.

Latency

Latency is a measure of how fast a system completes an execution path. Optimizing the throughput of your system will contribute to improving latency. Another dimension of latency is perceived performance, or how responsive the system seems to a user.

People don’t want to wait for pages to load or processes to finish. Users of your system will become frustrated if they frequently experience lengthy load times when trying to navigate list views, record pages, reports, and so. When this happens, customers or partners may decide to take their business elsewhere rather than deal with poorly performing systems. Internally, employees may create workarounds to avoid using the system as designed — which can create downstream issues for security and data integrity.

Perceived performance can be difficult to diagnose. When a user reports a slowness issue, support teams may be unable to reproduce the issue because increased latency is often due to a combination of smaller issues that build upon each other. This can make it difficult to diagnose the exact cause of perceived performance issues.

To reduce latency and improve responsiveness in your Salesforce system, consider:

The list of patterns and anti-patterns below shows what proper (and poor) latency looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for latency optimization, see Tools Relevant to Reliable.

Performance Patterns and Anti-Patterns

The following table outlines patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Throughput In your design standards:
- Guidance for how to use Platform Cache adheres to Platform Cache Best Practices
In your design standards:
- If there is guidance for Platform Cache usage, it is unclear or does not align with recommended best practices
In your org:
- Bulkification is used for data and system operations
- DML or Database methods always operate against collections in Apex
- All wildcard criteria appear in SOSL
- SOQL statements are selective, including:
-- no usage of LIKE comparisons or partial text comparisons
-- comparisons operators use positive logic (i.e. INCLUDES, IN) as primary or only logic
-- usage of = NULL, != NULL is rare and/or always follows a positive comparison operator
-- no LIMIT 1 statements appear
-- no usage of ALL ROWS keyword
- Asynchronous processing is favored where possible
- Platform Cache Partitions are configured
In your org:
- DML statements not bulkified
- DML or Database methods operate against single records in Apex
- SOSL is rarely or not consistently used for wildcard selection criteria
- SOQL statements are non-selective, including:
-- LIKE and wildcard filter criteria appear
-- comparisons using NOT, NOT IN criteria are used as the primary or only comparison operator
-- = NULL, != NULL criteria are used as the primary or only comparison operator
-- LIMIT 1 statements appear
-- ALL ROWS keyword is used
- SOQL appears within loops
- Synchronous processes are favored
- Visualforce view state is used for application caching
Latency In your org:
- Reports serve a single specific purpose and contain the minimum number of rows and columns needed to make decisions
- Filters use equals/not equal
- Filters do not contain formula fields
- Sharing models are simplified as much as possible
- Custom UI components use Lightning Web Components
- LWC uses Lightning Data Service for data operations
- Sorting and filtering list data is handled on the client side in JavaScript
- Salesforce Edge is enabled
In your org:
- Reports serve multiple purposes or contain extra rows and columns that aren't needed to make decisions
- Filters use contains/does not contain
- Filters contain formula fields
- Sharing models are complex
- Custom UI components use Aura or Visualforce
- LWC uses Apex for data operations
- Sorting and filtering list data is handled on the server side using Apex
- Salesforce Edge is not enabled

Scalability

Scalability is a measure of a system’s ability to continue to perform over time as it evolves and grows. A system that is scalable is able to handle large increases in transaction volumes or concurrent access without requiring foundational changes. Salesforce’s platform services are designed to support application scalability (for more on this, see Internal Platform Processing). With that said, as your organization grows and demand for your products and services increase, you are responsible for creating a system that can perform effectively (and as expected) as demands on the system and data volumes increase. Architecting for scalability from the start will result in faster delivery timelines for new features and less downtime as users and traffic increases.

Sometimes, a business will reach a critical “tipping point” where a system’s original design can no longer support the current degree of scale, and unexpected events send the system into an unstable state. Systems that have not been designed for scalability require constant troubleshooting, redesign, and refactoring — often at great cost to the business. Scalability issues commonly compound over time, creating multiple performance degradations throughout a system. In some cases, businesses find themselves spending a majority of development and maintenance resources on addressing scalability issues, instead of new features that create value.

You can better architect for scale by focusing on data model optimization and data volume management.

Note: Though not discussed here, testing for scalability is a critical part of validating your application architectures. For guidance, see testing strategy.

Data Modeling

Data modeling involves structuring the objects in your org (and relating them to one another) in a way that enables your users and automated processes to retrieve the data they need as quickly as possible. Taking steps to improve throughput will address many performance issues, but your efforts won’t be as effective without an optimized data model.

The negative impacts of a poorly designed data model are not immediately noticeable; rather, the weaknesses are exposed as the system grows in terms of data volume, processes, users, and integrations. A well-designed data model makes it easier to continuously refactor your application as requirements are added and extended.

To optimize your data model, consider:

The list of patterns and anti-patterns below shows what proper (and poor) data model optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for data model optimization, see Tools Relevant to Reliable.

Data Volume

Data volume is a measure of the amount of data stored within your system, based on record counts and sizes. If your org has tens of thousands of users, tens of millions of records, or hundreds of gigabytes of total record storage, you have a large data volume. The volume of data and relationships between objects in your org will affect scalability and will likely have a greater impact on scalability than the number of records alone.

To improve the scalability of orgs with large data volumes, consider:

In practice, you may not always be able to immediately address the root cause of a scalability issue when problems arise. For this reason, Salesforce does provide options to help ease immediate pain points. It is important to know that enabling these features in your org is not a viable, long-term architectural strategy for dealing with large data volumes. These short-term, stopgap workarounds can help reduce latency in systems suffering from poor data architecture, but they can also add technical debt to your org.

Short-term workarounds for scale issues include:

The list of patterns and anti-patterns below shows what proper (and poor) data volume management looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for managing data volumes, see Tools Relevant to Reliable.

Scalability Patterns and Anti-Patterns

The following table outlines patterns to look for (or build) in your org, and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Data Modeling In your design standards:
- Standards and guidance for which business justifications warrant a custom object exist
In your design standards:
- No standards for creating custom objects exist
In your data model:
- Standard objects are used where possible
- Tables have been denormalized for scale
In your data model:
- You have replicated standard objects
- Tables have been normalized to avoid redundancy
Within your business:
- Low-code builders understand the different field types supported by Salesforce, and evaluate reporting and encryption requirements before selecting field data types
- Sharing and data skew implications are evaluated before choosing to establish a master-detail relationship between objects
Within your business:
- Low-code builders select data types without evaluating downstream reporting and encryption requirements
- Sharing and data skew are not considered before establishing master-detail relationships between objects
Data Volume In your data:
- No parent records have more than 10,000 child records
- No users are assigned to more than 10,000 records of the same object type
- No instances exist where more than 10,000 records have lookup fields that point to the same record
- Bulk data loads are sorted into batches according to ParentId field values
- Bulk data loads into production do not occur during peak business hours
- Bulk data loads include only the minimum data needed for business decisions
In your data:
- Records with more than 10,000 child records exist
- Users are assigned to more than 10,000 records of the same type
- Instances exist where more than 10,000 records have lookup fields that point to the same record
- Bulk data loads are not sorted into batches according to ParentId field values
- Bulk data loads into production occur during peak business hours
- Bulk data loads are not limited to the minimum data needed for business decisions
In Flow and Apex :
- Logic exists to distribute the number of child records across multiple parent records in scenarios where data skew is a concern
- Logic exists to assign all records to the appropriate human users when imported or replicated via an integration
In Flow and Apex :
- Child records are arbitrarily assigned to parent records regardless of the number of existing child records that have already been assigned
- Records created via data loads or integrations are assigned to a generic "integration user"
Within your business:
- You have documented and implemented a data archiving and purging strategy
Within your business:
- You do not have a data archiving and purging strategy or your strategy has been documented but not implemented

Tools Relevant to Reliable

ToolDescriptionAvailabilityPerformanceScalability
Big Objects Store and manage large volumes of data on-platformX
Code ScannerScan Apex code for performance issuesX
Custom IndexesImprove query performance with custom indexesX
Deleting DataRemove unneeded data to improve performanceXX
DivisionsPartition data to limit record counts in queries and reportsX
Performance AssistantTest system performance and interpret the resultsX
Salesforce Code AnalyzerScan code via IDE, CLI or CI/CD to ensure it adheres to best practicesX
Salesforce Edge NetworkImprove download times and the user experience by routing your My Domain through Salesforce Edge Network.X
Skinny TablesAvoid joins on tables with frequently used fieldsX

Resources Relevant to Reliable

ResourceDescriptionAvailabilityPerformanceScalability
Analyze Performance & Scale Hotspots in Complex Salesforce AppsAn approach to address performance and scalability issues in your orgXX
Best Practices for Deployments with Large Data VolumesUnderstand process impacts of large data volumesX
Considerations for Salesforce Edge NetworkFind out how to prepare your org to use Salesforce Edge NetworkX
Design Standards TemplateCreate design standards for your organizationXXX
Data Model Design ConsiderationsOptimize data models for scale and maintenanceXX
Designing Record Access for Enterprise ScaleOptimize access control performance through configurationX
Infrastructure for Systems with Large Data VolumesLearn about capabilities that support system performance with LDVX
Learning Resources for Batch ManagementLearn about Batch ManagementXX
Lightning Experience Performance OptimizationImprove your Lightning Experience to help users work fasterX
Managing Lookup Skew in Salesforce to Avoid Record Lock ExceptionsUnderstand how to minimize the effects of lookup skewsXX
SOQL and SOSL Best PracticesFollow SOQL and SOSL best practices for LDVXX
Tools for Large-Scale RealignmentsPlan and execute realignments effectivelyX
Using MashupsMaintain large data sets in a different applicationXX

Tell us what you think

Help us keep Salesforce Well-Architected relevant to you; take our survey to provide feedback on this content and tell us what you’d like to see next.