Content last updated July 2022.
Read about our update schedules here.

Introduction

Reliable solutions operate effectively and dependably. Reliable architectures are available when and where users expect, perform consistently, and scale with the business.

Reliability is a representation of system quality. A system that is not error-prone, behaves as expected, and provides results in a timely manner is highly reliable. Conversely, a system that takes too long to complete tasks (at least from a user’s point of view), doesn’t do what users expect, or “errors out” at critical times is not reliable. Because unreliable systems can’t be counted on to provide an accurate view of the information stakeholders need to make key business decisions, they undermine a system’s ability to be trusted.

The reliability of a system is not constant. A system that’s reliable today may become unreliable in the future if it hasn’t been designed to grow and evolve with your organization. This can lead to costly maintenance and the need to refactor or completely re-implement a system, using funds that could have been spent on more strategic projects.

You can improve reliability in your Salesforce solutions by focusing on three key habits: availability, performance, and scalability.

Availability

Availability is a measure of the percentage of time that your system is operational. The Salesforce Platform handles most infrastructure-level availability concerns for you. However, the availability of the solutions you build on the platform, and experienced by your customers, is a shared responsibility. It’s important to understand that even with Salesforce’s commitment to high availability, the risk of system downtime is never zero.

Architects must prepare for Salesforce system downtime, whether from events like planned maintenance windows or unforeseen circumstances. In addition to the risks of service disruptions, you need to consider how your solutions will maintain high performance and grow with the business. Architectural choices that focus too narrowly on current requirements can lead to availability issues over time.

Think about availability during the design phase, before your solution is built. Even a single incident can cause stakeholders to lose trust and doubt the overall value of a system. The longer you defer architecting for availability, the higher the actual cost of availability issues will be in the long run. As an architect, you will need to use the language of the business, framing technical concerns in ways that make sense to business stakeholders to drive buy-in and alignment around prioritizing availability work.

You can architect for higher availability for your Salesforce solutions through risk assessment and failure mitigation.

Risk Assessment

Assessing risks in the context of Salesforce architecture involves identifying potential hazards that could impact the operation of your system, users of your system (including employees, partners, and customers), and business processes. Often, the formal process of conducting risk analysis will fall under the responsibilities of project managers. As an architect, it is your responsibility to make sure any risk analysis adequately represents the concerns of both the technical and the business stakeholders relying on your solutions.

Some of the biggest pitfalls in risk assessment rise from simply not dedicating time and thought to the task. Far too often, teams skip risk assessment altogether. Or they conflate solving for backup and restore (an important part of mitigating risks to data integrity) with comprehensive risk assessment and mitigation.

To accurately assess risk for your Salesforce solutions, consider:

The list of patterns and anti-patterns below shows what proper (and poor) risk assessment looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools related to risk assessment, see Tools Relevant to Reliable.

Failure Mitigation

A failure point is any place in the system that can cause significant parts of your system to perform unexpectedly (or not at all) when a service disruption or problem occurs. In reality, almost any point in a system could turn into a failure point. Good mitigation isn’t about attempting to pinpoint every potential failure point in a system. It is instead establishing ways to quickly and accurately classify and prioritize failure points and responses to allow maintenance and support teams to respond effectively. (For more on this, see incident response.)

To develop better failure mitigation strategies, consider:

Trigger Classfication/Type Mitigation
People Policy
Process Playbooks, Continuity Plans
Technology Redundancy
Trigger Mitigation Basic Intermediate Mature
User access change (new employee, departing employee) SLA and requirements around provisioning/deprovisioning users Provisioning/deprovisioning policy is enforced manually, according to manual SLAs Scheduled jobs process user changes according to policy, according to scheduled SLAs User provisioning/deprovisioning is automated via SSO/IDM solution

The list of patterns and anti-patterns below shows what proper (and poor) failure mitigation looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about tools available from Salesforce for failure mitigation, see Tools Relevant to Reliable.

Availability Patterns and Anti-Patterns

The following table outlines patterns to look for in your Salesforce operations and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Risk Assessment Within your business:
- An established risk assessment framework is in use
- Risks are categorized into people, process, and technology areas
Within your business:
- The risk assessment framework for Salesforce is ad hoc
- Risk isn't clearly identified
In your documentation:
- Risk severity is categorized and assessed based on customer impact
- Risk mitigation and response plans are prioritized, focusing on highest priority risks first
In your documentation:
- The customer perspective isn't considered when assessing risk severity or category
- Risk mitigation and response plans try to capture every risk imaginable
Failure Mitigation In your org:
- Failure point triggers and their corresponding mitigation plans are categorized by people, process, and technology
- Mitigation controls are put in place immediately, mature over time, and incorporate automation as early as possible
In your org:
- Failure point triggers are not classified; mitigation approaches are ad hoc or non-existent- Mitigation controls are not revisited or evolved
- Automation is not used in mitigation

Performance

Performance, in the context of system architecture, is a measure of a system’s overall processing capacity (throughput) and how fast it responds to requests and demands (latency). Typically, you derive an understanding about how your system performs through testing and production monitoring. A performant system completes processes within a timely manner, at any anticipated level of demand.

Poor performance goes hand-in-hand with higher latencies and lower throughputs, which lead to lower productivity and increased user frustration. Further, performance issues are often a matter of some urgency, and can lead to loss of trust among customers as well as financial losses.

You can improve the performance of your solutions by optimizing throughput and optimizing latency.

Note: Throughput and latency optimizations are essential aspects of improving system processing and responsiveness. It’s important to remember, however, that overall system performance also depends on how well you architect for scale. You must consider both dimensions in your designs.

Throughput Optimization

In terms of Salesforce architecture, throughput is the number of concurrent requests a system can complete within a given time interval. Salesforce solutions that have been designed and optimized for throughput better operate within the built-in governor limits of the platform.

Optimizing throughput in Salesforce begins with accurately calculating workloads in your system and planning for growth. Without accurate projections for how what kinds of demands will be made on the system, you will be unable to pinpoint potential issues with the throughput capabilities of your system. There are three dimensions to consider when thinking about workloads:

When thinking about performance, there can be a tendency to focus too narrowly on compute and the constraints on maximum CPU time that are among the platform’s governor limits. Teams that maintain this narrow focus overlook many methods for optimizing throughput that are unrelated to raw processing power. By expanding your view and applying these methods (outlined below), you can improve the overall throughput and efficiency of your Salesforce architectures, which in turn will help reduce latency and increase overall system performance.

To optimize throughput in your system, consider:

The list of patterns and anti-patterns below shows what proper (and poor) throughput optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for throughput optimization, see Tools Relevant to Reliable.

Latency Optimization

Latency is a measure of how fast a system completes an execution path. Optimizing the throughput of your system will contribute to improving latency. Another dimension of latency is perceived performance, or how responsive the system seems to a user.

People don’t want to wait for pages to load or processes to finish. Users of your system will become frustrated if they frequently experience lengthy load times when trying to navigate list views, record pages, reports, and so. When this happens, customers or partners may decide to take their business elsewhere rather than deal with poorly performing systems. Internally, employees may create workarounds to avoid using the system as designed — which can create downstream issues for security and data integrity.

Perceived performance can be difficult to diagnose. When a user reports a slowness issue, support teams may be unable to reproduce the issue because increased latency is often due to a combination of smaller issues that build upon each other. This can make it difficult to diagnose the exact cause of perceived performance issues.

To reduce latency and improve responsiveness in your Salesforce system, consider:

The list of patterns and anti-patterns below shows what proper (and poor) latency optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for latency optimization, see Tools Relevant to Reliable.

Performance Patterns and Anti-Patterns

The following table outlines patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Throughput Optimization In your design standards:
- Guidance for how to use Platform Cache adheres to Platform Cache Best Practices
In your design standards:
- If there is guidance for Platform Cache usage, it is unclear or does not align with recommended best practices
In your org:
- Bulkification is used for data and system operations
- SOSL is used for reads and SOQL is used for writes
- Asynchronous processing is favored where possible
- Platform Cache Partitions are configured
In your org:
- Bulkification is not used
- SOQL is used for reads and writes
- Synchronous processes are favored
- Visualforce view state is used for application caching
Latency Optimization In your org:
- Reports serve a single specific purpose and contain the minimum number of rows and columns needed to make decisions
- Filters use equals/not equal
- Filters do not contain formula fields
- Sharing models are simplified as much as possible
- Custom UI components use Lightning Web Components
- LWC uses Lightning Data Service for data operations
- Sorting and filtering list data is handled on the client side in JavaScript
In your org:
- Reports serve multiple purposes or contain extra rows and columns that aren't needed to make decisions
- Filters use contains/does not contain
- Filters contain formula fields
- Sharing models are complex
- Custom UI components use Aura or Visualforce
- LWC uses Apex for data operations
- Sorting and filtering list data is handled on the server side using Apex

Scalability

Scalability is a measure of a system’s ability to continue to perform over time as it evolves and grows. A system that is scalable is able to handle large increases in transaction volumes or concurrent access without requiring foundational changes. Salesforce’s platform services are designed to support application scalability (for more on this, see Internal Platform Processing). With that said, as your organization grows and demand for your products and services increase, you are responsible for creating a system that can perform effectively (and as expected) as demands on the system and data volumes increase. Architecting for scalability from the start will result in faster delivery timelines for new features and less downtime as users and traffic increases.

Sometimes, a business will reach a critical “tipping point” where a system’s original design can no longer support the current degree of scale, and unexpected events send the system into an unstable state. Systems that have not been designed for scalability require constant troubleshooting, redesign, and refactoring — often at great cost to the business. Scalability issues commonly compound over time, creating multiple performance degradations throughout a system. In some cases, businesses find themselves spending a majority of development and maintenance resources on addressing scalability issues, instead of new features that create value.

You can better architect for scale by focusing on three keys: data model optimization, data volume management, and scale testing.

Data Model Optimization

Data model optimization involves structuring the objects in your org (and relating them to one another) in a way that enables your users and automated processes to retrieve the data they need as quickly as possible. Taking steps to improve throughput will address many performance issues, but your efforts won’t be as effective without an optimized data model.

The negative impacts of a poorly designed data model are not immediately noticeable; rather, the weaknesses are exposed as the system grows in terms of data volume, processes, users, and integrations. A well-designed data model design makes it easier to continuously refactor your application as requirements are added and extended.

To optimize your data model, consider:

The list of patterns and anti-patterns below shows what proper (and poor) data model optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for data model optimization, see Tools Relevant to Reliable.

Data Volume Management

Data volume is a measure of the amount of data stored within your system, based on record counts and sizes. If your org has tens of thousands of users, tens of millions of records, or hundreds of gigabytes of total record storage, you have a large data volume. The volume of data and relationships between objects in your org will affect scalability and will likely have a greater impact on scalability than the number of records alone.

To improve the scalability of orgs with large data volumes, consider:

In practice, you may not always be able to immediately address the root cause of a scalability issue when problems arise. For this reason, Salesforce does provide options to help ease immediate pain points. It is important to know that enabling these features in your org is not a viable, long-term architectural strategy for dealing with large data volumes. These short-term, stopgap workarounds can help reduce latency in systems suffering from poor data architecture, but they can also add technical debt to your org.

Short-term workarounds for scale issues include:

The list of patterns and anti-patterns below shows what proper (and poor) data volume management looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for managing data volumes, see Tools Relevant to Reliable.

Scale Testing

Scale testing is a specific area within performance testing. Performance testing overall is concerned with how well an application or system holds up under a variety of conditions — including differing levels of demand. If your Salesforce system handles B2C-scale applications, has a large volume of users, or has large volumes of record data, ensure you incorporate performance and scale testing into your application lifecycle management (ALM) process.

To maximize the benefits of performance and scale testing, make it an integral, intentional part of your quality assurance processes at a reasonably early stage in the development cycle. (For a more in-depth look at Salesforce testing and recommendations for structuring test cycles, see Testing Strategy.)

To create effective scale tests for Salesforce, consider:

For more in-depth guidance on how to conduct scale and performance testing, see How to Scale Test and 7 Anti-Patterns in Performance and Scale Testing on the Salesforce Architect blog.

The list of patterns and anti-patterns below shows what proper (and poor) scale testing looks like for a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.

To learn more about Salesforce tools for scale testing, see Tools Relevant to Reliable.

Scalability Patterns and Anti-Patterns

The following table outlines patterns to look for (or build) in your org, and anti-patterns to avoid or target for remediation.

Patterns Anti-Patterns
Data Model Optimization In your design standards:
- Standards and guidance for which business justifications warrant a custom object exist
In your design standards:
- No standards for creating custom objects exist
In your data model:
- Standard objects are used where possible
- Tables have been denormalized for scale
In your data model:
- You have replicated standard objects
- Tables have been normalized to avoid redundancy
Within your business:
- Low-code builders understand the different field types supported by Salesforce, and evaluate reporting and encryption requirements before selecting field data types
- Sharing and data skew implications are evaluated before choosing to establish a master-detail relationship between objects
Within your business:
- Low-code builders select data types without evaluating downstream reporting and encryption requirements
- Sharing and data skew are not considered before establishing master-detail relationships between objects
Data Volume Management In your data:
- No parent records have more than 10,000 child records
- No users are assigned to more than 10,000 records of the same object type
- No instances exist where more than 10,000 records have lookup fields that point to the same record
In your data:
- Records with more than 10,000 child records exist
- Users are assigned to more than 10,000 records of the same type
- Instances exist where more than 10,000 records have lookup fields that point to the same record
In Flow and Apex :
- Logic exists to distribute the number of child records across multiple parent records in scenarios where data skew is a concern
- Logic exists to assign all records to the appropriate human users when imported or replicated via an integration
In Flow and Apex :
- Child records are arbitrarily assigned to parent records regardless of the number of existing child records that have already been assigned
- Records created via data loads or integrations are assigned to a generic "integration user"
Within your business:
- You have documented and implemented a data archiving and purging strategy
Within your business:
- You do not have a data archiving and purging strategy or your strategy has been documented but not implemented
Scale Testing Within your business:
- You include scale testing as part of your QA process when you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests are prioritized on the highest value aspects of the system- Your scale tests have well-defined criteria
- You conduct scale testing in a Full sandbox
Within your business:
- You don't conduct scale tests as a part of your QA process and you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests aren't prioritized
- Your scale tests don't have well-defined criteria
- You conduct scale tests in a Partial Copy or Developer sandbox

Tools Relevant to Reliable

ToolDescriptionAvailabilityPeformanceScalability
Apex Hammer TestsLearn about Salesforce Apex testing in current and new releasesXX
Backup and RestoreAutomatically generate backups to prevent data lossX
Big Objects Store and manage large volumes of data on-platformX
Custom IndexesImprove query performance with custom indexesX
Defer Sharing CalculationsProcess sharing rules after loading dataX
Deleting DataRemove unneeded data to improve performanceXX
DivisionsPartition data to limit record counts in queries and reportsX
Lightning Data ServicePerform database operations without codeX
Lightning Platform Query OptimizerStreamline data access to optimize queriesX
Lightning Usage AppAnalyze performance for popular pagesX
Performance AssistantTest system performance and interpret the resultsXX
Platform CacheImprove performance and reliability when caching dataX
Query OptimizerBuild efficient SOQL queries, reports, and list viewsX
Salesforce Lightning InspectorInspect data about running applications and componentsX
Salesforce Page OptimizerAnalyze Experience Cloud page performanceX
Skinny TablesAvoid joins on tables with frequently used fieldsX

Resources Relevant to Reliable

ResourceDescriptionAvailabilityPeformanceScalability
7 Anti-Patterns in Performance and Scale TestingAvoid common anti-patterns in performance and scale testingXX
Architecting for AnalyticsModel data for optimal analysis and decision makingX
Best Practices for Deployments with Large Data VolumesUnderstand process impacts of large data volumesX
Custom Field TypesReview the different types of custom fieldsX
Data Model Design ConsiderationsOptimize data models for scale and maintenanceXX
Design Standards TemplateCreate design standards for your organizationXXX
Designing Record Access for Enterprise ScaleOptimize access control performance through configurationX
How to Scale Test on SalesforceApproach scale testing in five stepsX
Improve inefficient related listsLearn how to design efficient related listsX
Improve SOQL Query PerformanceUnderstand selectivity and custom indexesX
Infrastructure for Systems with Large Data VolumesLearn about capabilities that support system performance with LDVX
Learning Resources for Batch ManagementLearn about Batch ManagementXX
Lightning Platform query optimization FAQReview answers to questions about Force.com Query OptimizerX
Managing Lookup Skew in Salesforce to Avoid Record Lock ExceptionsUnderstand how to minimize the effects of lookup skewsXX
Notes on Changing Custom Field TypesUnderstand considerations for field type conversionsX
Query Plan FAQOptimize SOQL queries involving large volumesX
Scale Test Strategy ChecklistCreate and customize scale and performance test plansXX
SOQL and SOSL Best PracticesFollow SOQL and SOSL best practices for LDVXX
Technical Requirements for Lightning Experience Understand Lightning Experience technical requirementsX
Techniques for Optimizing PerformanceOptimize Salesforce performance for large data volumesX
Tools for Large-Scale RealignmentsPlan and execute realignments effectivelyX
Using MashupsMaintain large data sets in a different applicationXX
Working with Very Large SOQL QueriesImprove SOQL query efficiencyX

Tell us what you think

Help us keep Salesforce Well-Architected relevant to you; take our survey to provide feedback on this content and tell us what you’d like to see next.