Content last updated January 2023.
Read about our update schedules here.
Reliable solutions operate effectively and dependably. Reliable architectures are available when and where users expect, perform consistently, and scale with the business.
Reliability is a representation of system quality. A system that is not error-prone, behaves as expected, and provides results in a timely manner is highly reliable. Conversely, a system that takes too long to complete tasks (at least from a user’s point of view), doesn’t do what users expect, or “errors out” at critical times is not reliable. Because unreliable systems can’t be counted on to provide an accurate view of the information stakeholders need to make key business decisions, they undermine a system’s ability to be trusted.
The reliability of a system is not constant. A system that’s reliable today may become unreliable in the future if it hasn’t been designed to grow and evolve with your organization. This can lead to costly maintenance and the need to refactor or completely re-implement a system, using funds that could have been spent on more strategic projects.
You can improve reliability in your Salesforce solutions by focusing on three key habits: availability, performance, and scalability.
Availability is a measure of the percentage of time that your system is operational. The Salesforce Platform handles most infrastructure-level availability concerns for you. However, the availability of the solutions you build on the platform, and experienced by your customers, is a shared responsibility. It’s important to understand that even with Salesforce’s commitment to high availability, the risk of system downtime is never zero.
Architects must prepare for Salesforce system downtime, whether from events like planned maintenance windows or unforeseen circumstances. In addition to the risks of service disruptions, you need to consider how your solutions will maintain high performance and grow with the business. Architectural choices that focus too narrowly on current requirements can lead to availability issues over time.
Think about availability during the design phase, before your solution is built. Even a single incident can cause stakeholders to lose trust and doubt the overall value of a system. The longer you defer architecting for availability, the higher the actual cost of availability issues will be in the long run. As an architect, you will need to use the language of the business, framing technical concerns in ways that make sense to business stakeholders to drive buy-in and alignment around prioritizing availability work.
You can architect for higher availability for your Salesforce solutions through risk assessment and failure mitigation.
Assessing risks in the context of Salesforce architecture involves identifying potential hazards that could impact the operation of your system, users of your system (including employees, partners, and customers), and business processes. Often, the formal process of conducting risk analysis will fall under the responsibilities of project managers. As an architect, it is your responsibility to make sure any risk analysis adequately represents the concerns of both the technical and the business stakeholders relying on your solutions.
Some of the biggest pitfalls in risk assessment rise from simply not dedicating time and thought to the task. Far too often, teams skip risk assessment altogether. Or they conflate solving for backup and restore (an important part of mitigating risks to data integrity) with comprehensive risk assessment and mitigation.
To accurately assess risk for your Salesforce solutions, consider:
The list of patterns and anti-patterns below shows what proper (and poor) risk assessment looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.
To learn more about Salesforce tools related to risk assessment, see Tools Relevant to Reliable.
A failure point is any place in the system that can cause significant parts of your system to perform unexpectedly (or not at all) when a service disruption or problem occurs. In reality, almost any point in a system could turn into a failure point. Good mitigation isn’t about attempting to pinpoint every potential failure point in a system. It is instead establishing ways to quickly and accurately classify and prioritize failure points and responses to allow maintenance and support teams to respond effectively. (For more on this, see incident response.)
To develop better failure mitigation strategies, consider:
|Process||Playbooks, Continuity Plans|
|User access change (new employee, departing employee)||SLA and requirements around provisioning/deprovisioning users||Provisioning/deprovisioning policy is enforced manually, according to manual SLAs||Scheduled jobs process user changes according to policy, according to scheduled SLAs||User provisioning/deprovisioning is automated via SSO/IDM solution|
The list of patterns and anti-patterns below shows what proper (and poor) failure mitigation looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.
To learn more about tools available from Salesforce for failure mitigation, see Tools Relevant to Reliable.
The following table outlines patterns to look for in your Salesforce operations and anti-patterns to avoid or target for remediation.
|Risk Assessment||Within your business:
- An established risk assessment framework is in use
- Risks are categorized into people, process, and technology areas
|Within your business:
- The risk assessment framework for Salesforce is ad hoc
- Risk isn't clearly identified
|In your documentation:
- Risk severity is categorized and assessed based on customer impact
- Risk mitigation and response plans are prioritized, focusing on highest priority risks first
|In your documentation:
- The customer perspective isn't considered when assessing risk severity or category
- Risk mitigation and response plans try to capture every risk imaginable
|Failure Mitigation||In your org:
- Failure point triggers and their corresponding mitigation plans are categorized by people, process, and technology
- Mitigation controls are put in place immediately, mature over time, and incorporate automation as early as possible
|In your org:
- Failure point triggers are not classified; mitigation approaches are ad hoc or non-existent
- Mitigation controls are not revisited or evolved
- Automation is not used in mitigation
Performance, in the context of system architecture, is a measure of a system’s overall processing capacity (throughput) and how fast it responds to requests and demands (latency). Typically, you derive an understanding about how your system performs through testing and production monitoring. A performant system completes processes within a timely manner, at any anticipated level of demand.
Poor performance goes hand-in-hand with higher latencies and lower throughputs, which lead to lower productivity and increased user frustration. Further, performance issues are often a matter of some urgency, and can lead to loss of trust among customers as well as financial losses.
You can improve the performance of your solutions by optimizing throughput and optimizing latency.
Note: Throughput and latency optimizations are essential aspects of improving system processing and responsiveness. It’s important to remember, however, that overall system performance also depends on how well you architect for scale. You must consider both dimensions in your designs.
In terms of Salesforce architecture, throughput is the number of concurrent requests a system can complete within a given time interval. Salesforce solutions that have been designed and optimized for throughput better operate within the built-in governor limits of the platform.
Optimizing throughput in Salesforce begins with accurately calculating workloads in your system and planning for growth. Without accurate projections for how what kinds of demands will be made on the system, you will be unable to pinpoint potential issues with the throughput capabilities of your system. There are three dimensions to consider when thinking about workloads:
When thinking about performance, there can be a tendency to focus too narrowly on compute and the constraints on maximum CPU time that are among the platform’s governor limits. Teams that maintain this narrow focus overlook many methods for optimizing throughput that are unrelated to raw processing power. By expanding your view and applying these methods (outlined below), you can improve the overall throughput and efficiency of your Salesforce architectures, which in turn will help reduce latency and increase overall system performance.
To optimize throughput in your system, consider:
The list of patterns and anti-patterns below shows what proper (and poor) throughput optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.
To learn more about Salesforce tools for throughput optimization, see Tools Relevant to Reliable.
Latency is a measure of how fast a system completes an execution path. Optimizing the throughput of your system will contribute to improving latency. Another dimension of latency is perceived performance, or how responsive the system seems to a user.
People don’t want to wait for pages to load or processes to finish. Users of your system will become frustrated if they frequently experience lengthy load times when trying to navigate list views, record pages, reports, and so. When this happens, customers or partners may decide to take their business elsewhere rather than deal with poorly performing systems. Internally, employees may create workarounds to avoid using the system as designed — which can create downstream issues for security and data integrity.
Perceived performance can be difficult to diagnose. When a user reports a slowness issue, support teams may be unable to reproduce the issue because increased latency is often due to a combination of smaller issues that build upon each other. This can make it difficult to diagnose the exact cause of perceived performance issues.
To reduce latency and improve responsiveness in your Salesforce system, consider:
The list of patterns and anti-patterns below shows what proper (and poor) latency optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.
To learn more about Salesforce tools for latency optimization, see Tools Relevant to Reliable.
The following table outlines patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.
|Throughput Optimization||In your design standards:
- Guidance for how to use Platform Cache adheres to Platform Cache Best Practices
|In your design standards:
- If there is guidance for Platform Cache usage, it is unclear or does not align with recommended best practices
|In your org:
- Bulkification is used for data and system operations
- All wildcard criteria appear in SOSL
- Few (or no) SOQL statements with
- Asynchronous processing is favored where possible
- Platform Cache Partitions are configured
|In your org:
- Bulkification is not used
- SOSL is not used
- Synchronous processes are favored
- Visualforce view state is used for application caching
|Latency Optimization||In your org:
- Reports serve a single specific purpose and contain the minimum number of rows and columns needed to make decisions
- Filters use equals/not equal
- Filters do not contain formula fields
- Sharing models are simplified as much as possible
- Custom UI components use Lightning Web Components
- LWC uses Lightning Data Service for data operations
|In your org:
- Reports serve multiple purposes or contain extra rows and columns that aren't needed to make decisions
- Filters use contains/does not contain
- Filters contain formula fields
- Sharing models are complex
- Custom UI components use Aura or Visualforce
- LWC uses Apex for data operations
- Sorting and filtering list data is handled on the server side using Apex
Scalability is a measure of a system’s ability to continue to perform over time as it evolves and grows. A system that is scalable is able to handle large increases in transaction volumes or concurrent access without requiring foundational changes. Salesforce’s platform services are designed to support application scalability (for more on this, see Internal Platform Processing). With that said, as your organization grows and demand for your products and services increase, you are responsible for creating a system that can perform effectively (and as expected) as demands on the system and data volumes increase. Architecting for scalability from the start will result in faster delivery timelines for new features and less downtime as users and traffic increases.
Sometimes, a business will reach a critical “tipping point” where a system’s original design can no longer support the current degree of scale, and unexpected events send the system into an unstable state. Systems that have not been designed for scalability require constant troubleshooting, redesign, and refactoring — often at great cost to the business. Scalability issues commonly compound over time, creating multiple performance degradations throughout a system. In some cases, businesses find themselves spending a majority of development and maintenance resources on addressing scalability issues, instead of new features that create value.
You can better architect for scale by focusing on three keys: data model optimization, data volume management, and scale testing.
Data model optimization involves structuring the objects in your org (and relating them to one another) in a way that enables your users and automated processes to retrieve the data they need as quickly as possible. Taking steps to improve throughput will address many performance issues, but your efforts won’t be as effective without an optimized data model.
The negative impacts of a poorly designed data model are not immediately noticeable; rather, the weaknesses are exposed as the system grows in terms of data volume, processes, users, and integrations. A well-designed data model design makes it easier to continuously refactor your application as requirements are added and extended.
To optimize your data model, consider:
The list of patterns and anti-patterns below shows what proper (and poor) data model optimization looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.
To learn more about Salesforce tools for data model optimization, see Tools Relevant to Reliable.
Data volume is a measure of the amount of data stored within your system, based on record counts and sizes. If your org has tens of thousands of users, tens of millions of records, or hundreds of gigabytes of total record storage, you have a large data volume. The volume of data and relationships between objects in your org will affect scalability and will likely have a greater impact on scalability than the number of records alone.
To improve the scalability of orgs with large data volumes, consider:
In practice, you may not always be able to immediately address the root cause of a scalability issue when problems arise. For this reason, Salesforce does provide options to help ease immediate pain points. It is important to know that enabling these features in your org is not a viable, long-term architectural strategy for dealing with large data volumes. These short-term, stopgap workarounds can help reduce latency in systems suffering from poor data architecture, but they can also add technical debt to your org.
Short-term workarounds for scale issues include:
The list of patterns and anti-patterns below shows what proper (and poor) data volume management looks like in a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.
To learn more about Salesforce tools for managing data volumes, see Tools Relevant to Reliable.
Scale testing is a specific area within performance testing. Performance testing overall is concerned with how well an application or system holds up under a variety of conditions — including differing levels of demand. If your Salesforce system handles B2C-scale applications, has a large volume of users, or has large volumes of record data, ensure you incorporate performance and scale testing into your application lifecycle management (ALM) process.
To maximize the benefits of performance and scale testing, make it an integral, intentional part of your quality assurance processes at a reasonably early stage in the development cycle. (For a more in-depth look at Salesforce testing and recommendations for structuring test cycles, see Testing Strategy.)
To create effective scale tests for Salesforce, consider:
For more in-depth guidance on how to conduct scale and performance testing, see How to Scale Test and 7 Anti-Patterns in Performance and Scale Testing on the Salesforce Architect blog.
The list of patterns and anti-patterns below shows what proper (and poor) scale testing looks like for a Salesforce org. You can use these to validate your designs before you build, or identify opportunities for further optimization.
To learn more about Salesforce tools for scale testing, see Tools Relevant to Reliable.
The following table outlines patterns to look for (or build) in your org, and anti-patterns to avoid or target for remediation.
|Data Model Optimization||In your design standards:
- Standards and guidance for which business justifications warrant a custom object exist
|In your design standards:
- No standards for creating custom objects exist
|In your data model:
- Standard objects are used where possible
- Tables have been denormalized for scale
|In your data model:
- You have replicated standard objects
- Tables have been normalized to avoid redundancy
|Within your business:
- Low-code builders understand the different field types supported by Salesforce, and evaluate reporting and encryption requirements before selecting field data types
- Sharing and data skew implications are evaluated before choosing to establish a master-detail relationship between objects
|Within your business:
- Low-code builders select data types without evaluating downstream reporting and encryption requirements
- Sharing and data skew are not considered before establishing master-detail relationships between objects
|Data Volume Management||In your data:
- No parent records have more than 10,000 child records
- No users are assigned to more than 10,000 records of the same object type
- No instances exist where more than 10,000 records have lookup fields that point to the same record
|In your data:
- Records with more than 10,000 child records exist
- Users are assigned to more than 10,000 records of the same type
- Instances exist where more than 10,000 records have lookup fields that point to the same record
|In Flow and Apex :
- Logic exists to distribute the number of child records across multiple parent records in scenarios where data skew is a concern
- Logic exists to assign all records to the appropriate human users when imported or replicated via an integration
|In Flow and Apex :
- Child records are arbitrarily assigned to parent records regardless of the number of existing child records that have already been assigned
- Records created via data loads or integrations are assigned to a generic "integration user"
|Within your business:
- You have documented and implemented a data archiving and purging strategy
|Within your business:
- You do not have a data archiving and purging strategy or your strategy has been documented but not implemented
|Scale Testing||Within your business:
- You include scale testing as part of your QA process when you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests are prioritized on the highest value aspects of the system- Your scale tests have well-defined criteria
- You conduct scale testing in a Full sandbox
|Within your business:
- You don't conduct scale tests as a part of your QA process and you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests aren't prioritized
- Your scale tests don't have well-defined criteria
- You conduct scale tests in a Partial Copy or Developer sandbox
|Apex Hammer Tests||Learn about Salesforce Apex testing in current and new releases||X||X|
|Backup and Restore||Automatically generate backups to prevent data loss||X|
|Big Objects||Store and manage large volumes of data on-platform||X|
|Custom Indexes||Improve query performance with custom indexes||X|
|Defer Sharing Calculations||Process sharing rules after loading data||X|
|Deleting Data||Remove unneeded data to improve performance||X||X|
|Divisions||Partition data to limit record counts in queries and reports||X|
|Lightning Data Service||Perform database operations without code||X|
|Lightning Platform Query Optimizer||Streamline data access to optimize queries||X|
|Lightning Usage App||Analyze performance for popular pages||X|
|Performance Assistant||Test system performance and interpret the results||X||X|
|Platform Cache||Improve performance and reliability when caching data||X|
|Query Optimizer||Build efficient SOQL queries, reports, and list views||X|
|Salesforce Lightning Inspector||Inspect data about running applications and components||X|
|Salesforce Page Optimizer||Analyze Experience Cloud page performance||X|
|Skinny Tables||Avoid joins on tables with frequently used fields||X|
|7 Anti-Patterns in Performance and Scale Testing||Avoid common anti-patterns in performance and scale testing||X||X|
|Architecting for Analytics||Model data for optimal analysis and decision making||X|
|Best Practices for Deployments with Large Data Volumes||Understand process impacts of large data volumes||X|
|Custom Field Types||Review the different types of custom fields||X|
|Data Model Design Considerations||Optimize data models for scale and maintenance||X||X|
|Design Standards Template||Create design standards for your organization||X||X||X|
|Designing Record Access for Enterprise Scale||Optimize access control performance through configuration||X|
|How to Scale Test on Salesforce||Approach scale testing in five steps||X|
|Improve inefficient related lists||Learn how to design efficient related lists||X|
|Improve SOQL Query Performance||Understand selectivity and custom indexes||X|
|Infrastructure for Systems with Large Data Volumes||Learn about capabilities that support system performance with LDV||X|
|Learning Resources for Batch Management||Learn about Batch Management||X||X|
|Lightning Platform query optimization FAQ||Review answers to questions about Force.com Query Optimizer||X|
|Managing Lookup Skew in Salesforce to Avoid Record Lock Exceptions||Understand how to minimize the effects of lookup skews||X||X|
|Notes on Changing Custom Field Types||Understand considerations for field type conversions||X|
|Query Plan FAQ||Optimize SOQL queries involving large volumes||X|
|Scale Test Strategy Checklist||Create and customize scale and performance test plans||X||X|
|SOQL and SOSL Best Practices||Follow SOQL and SOSL best practices for LDV||X||X|
|Technical Requirements for Lightning Experience||Understand Lightning Experience technical requirements||X|
|Techniques for Optimizing Performance||Optimize Salesforce performance for large data volumes||X|
|Tools for Large-Scale Realignments||Plan and execute realignments effectively||X|
|Using Mashups||Maintain large data sets in a different application||X||X|
|Working with Very Large SOQL Queries||Improve SOQL query efficiency||X|
Help us keep Salesforce Well-Architected relevant to you; take our survey to provide feedback on this content and tell us what you’d like to see next.