Read about our update schedules here.

Introduction

Resilient solutions handle change well. Resiliency is the ability to quickly and effectively recover from a problem or failure. Resilience is grounded in two key qualities: toughness and elasticity. Toughness in a system enables it to withstand and endure difficulties. Elasticity means a system is able to return to an ideal state or shape. Architecting for resilience means combining these two aspects into your systems to create strength and flexibility in the face of change — whether that change is intentional or unplanned.

In technology contexts, a system demonstrates resilient behavior by continuing to function or quickly return to a stable state even if individual pieces in the system fail. Code and configuration defects can create unexpected (and unwanted) behavior, as can network and hardware issues. As a result, every component in your architecture has the potential to fail.

You can improve the resilience of your Salesforce solutions by focusing on three key habits: application lifecycle management, incident response, and continuity planning.

Application Lifecycle Management

Application lifecycle management (ALM) is software development practice that focuses on how software is created, delivered, and managed — from idea through end-of-life. Encompassing people, processes, and tools, ALM is a holistic way of looking at the big picture of how applications are conceived, approved, built, delivered, and managed, along with the more specific disciplines (including DevOps, specific delivery methodologies, testing strategies, governance, and CI/CD) that might be involved.

Healthy ALM means that the business can react quickly to changes, and applications can keep pace without compromising stability or quality. It is a cornerstone of resiliency. Without clear and practical ALM, teams will struggle at every stage of app creation, delivery, and maintenance. Symptoms of poor ALM include:

Because ALM touches nearly every aspect of a solution, establishing clear and effective ALM practices is a key part of architectural work.

You can build better ALM practices by focusing on three key areas: release management, environment strategy, and testing strategy.

Release Management

Release management involves planning, sequencing, controlling, and migrating changes into different environments. Technically, a release can be any time you move changes into an environment. In this context, the term release refers to an intentional group of changes, moved into a given environment at the same time.

Introducing change into a stable system causes that system to transition from a stable state to a new state. During this transition, the system is vulnerable to further changes triggering an uncontrolled, unstable state — which can cause a critical incident. From an architectural point of view, designing for resilient releases is more than just ensuring individual changes undergo effective testing; it also includes planning for how changes are introduced into your systems (and to users) safely.

It’s critical to be clear about what, how, and how often changes will move into the system. If your project or business has official change management and enablement processes and teams, their work depends on predictable and accurate release information. Business stakeholders also care about release information — especially as it relates to features or bug fixes they’ve requested. Establishing consistent and clear release schedules, and shipping stable release artifacts, is an effective way to build trust in your solution and demonstrate value to your stakeholders.

To enable effective release management for Salesforce, consider:

Note: All of these release mechanisms use source-driven development and SFDX. You can mix-and-match all of these approaches to create the right release structures for your company and teams. You do not have to take an all-or-nothing approach. All of these options are fundamentally compatible with each other.

The list of patterns and anti-patterns below shows what a proper (and poor) release management looks like for a Salesforce org. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about Salesforce tools for release management, see Tools Relevant to Resilient.

Environment Strategy

Salesforce provides a variety of environments for you to use during application development and testing cycles. An effective environment strategy for Salesforce requires understanding how to use different environments and what good management looks like. This is a key competency for healthy ALM cycles. Environments are where work gets done. Their usefulness in ALM comes from the level of fidelity to production they provide, along with their isolation from production.

Compared to a poor environment strategy (or no strategy at all), a good environment strategy provides several benefits:

Teams often struggle to realize these benefits. Challenges to getting the most out of your development environments and strategy come from many sources. One key source is the type of development model your teams follow. In the older org-based development approach, environments had more than one role to fulfill. They have to be the place where various kinds of work happen, and they also have to be the source for your release artifacts (that is, the metadata that you want to deploy in a release). This often means environments were not easy to set up or tear down, they were often overcrowded and full of metadata conflicts between teams, and they did not contribute meaningful speed or flexibility to ALM overall.

Using a source-based development model fundamentally shifts the relationship environments have to your releases and release artifacts. In this model, source control is the source of the metadata you want to release. Environments are just places where work gets done.

However, the source-based development model is not a guarantee of good environment strategy by itself. Even with source control, teams can still struggle to set up conditions to test external system integrations, configurations that depend on metadata not in source control (like managed packages or customizations that depend on data), and so on. This can lead to challenges similar issues to those seen with an org-based model.

To develop an effective environment strategy, consider:

Features Scratch Org Developer Sandbox Developer Pro Sandbox Partial Copy Sandbox Full Sandbox
Supports org shape Yes No No No No
Supports source tracking Yes Yes Yes No No
Lifespan 1 - 30 days Manually controlled Manually controlled Manually controlled Manually controlled
Refresh Interval N/A 1 day 1 day 5 days 29 days
Release Preview Support Developer controlled Based on sandbox instance Based on sandbox instance Based on sandbox instance Based on sandbox instance
Provisioning Time >5 minutes Hours - Days Hours - Days Hours - Days Hours - Days
Metadata determined by Source control Production Production Production Production
Data determined by Manual data load Manual data load Manual data load Sandbox template Production
Data limit 200 MB 200 MB 1 GB 5 GB Matches Production

Here is how different features map to common development tasks, along with compatible environment recommendations:

Task Org Shape Source Tracking Frequent Refreshes Release Preview Support All metadata from production Partial Metadata from production Large datasets from production Partial datasets from production Compatible Environments
Prototyping X X X X X X X Scratch Orgs, Developer and Developer Pro Sandboxes
New feature investigations or proof-of-concept development X X X X X X X Scratch Orgs, Developer and Developer Pro Sandboxes
User acceptance testing X X X X X X Developer, Developer Pro and Partial Copy sandboxes
Performance and scale testing X X X Full sandbox
User Training X X X X X* X Developer Pro, Partial Copy and*Full sandboxes
*If required to complete a specific kind of work, otherwise use a less resource-intense environment

The list of patterns and anti-patterns below shows what a proper (and poor) environment management looks like for a Salesforce org. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools for environment management, see Tools Relevant to Resilient.

Testing Strategy

A test strategy is the guiding principles and standards for how you plan and run tests that gauge the success/failure of your applications during your ALM processes. Test strategy keeps every stakeholder involved in testing aligned with the priority, purpose, and scope of a given test, and helps project teams create effective and thoughtful test plans.

Typically, developers or quality assurance/testing experts will be involved in creating and executing specific tests. Test strategy helps ensure that these individuals know what kinds of tests need to be conducted for a given project, in what sequence tests should occur, and what is needed for building well-formed tests, test plans, and artifacts (for example, test data sets, devices, traffic or network simulators, and so on).

Effective testing strategy creates a clear picture of how, when, where, and why to run different test types (including unit tests, UI tests, and regression tests) in various combinations and conditions to uncover how your system (including any in-flight changes) will behave. An effective test strategy produces tests that better show you how well the system conforms to non-functional requirements (such as scalability, reliability, and usability) that can be difficult to measure through a single kind of test.

To create effective testing strategies for Salesforce, consider:

ALM Patterns and Anti-Patterns

The following table shows a selection of patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.

✨ Discover more patterns for ALM in the Pattern & Anti-Pattern Explorer.

Patterns Anti-Patterns
Release Management In production:
- Metadata shows use of stable release mechanisms, such as:
-- Metadata organized into unlocked packages
-- DevOps Center is active and installed
-- Deployments via Metadata API use source format
- Deployment logs show no failed deployments within the available history
- Deployment history shows clear release cadences and fairly uniform deployment clusters within release windows
In production:
- Metadata indicates use of org-based release mechanisms, such as:
-- Active use of change sets
-- Deployments via Metadata API use package.xml format
- Deployment logs show repeated instances of failed deployments within the available history
- Deployments have no discernable cadence or show uneven clusters of deployments (signs of hot-fix and ad hoc rollbacks)
- DevOps Center is not enabled and installed
In your roadmap and documentation:
- Release names are clear
- Features are tied clearly to a specific, named release
- Release names are searchable and discoverable
- Teams can find and follow clear guidelines for tagging artifacts, development items, and other work with the correct release names
- It is possible to pull together a clear view of a release manifest by release name
- Quality threshholds for generative AI apps are defined for different development stages
In your roadmap and documentation:
- Release names are absent
- Features are not tied clearly to a specific release
- Release names are ad hoc or do not exist
- Teams refer to artifacts, development items, and other work in different ways
- It is not possible to pull together a clear view of a release manifest using a release name
- Quality thresholds for generative AI apps are not defined, or are not defined at different development stages
Environment Strategy In your orgs:
- A source-driven development and release model is adopted
- Source tracking is enabled for Developer and Developer Pro sandboxes
- Metadata in a given environment is independent from your release artifacts
- Environments do not directly correspond to a release path
- Release paths for a change depend on the type of the change (high risk, medium risk, low risk)
- Overcrowded environments do not exist
- Risky configuration changes are never made directly in production
- No releases occur during peak business hours
In your orgs:
- An org-based development and release model is adopted
- Source tracking is not enabled for Developer and Developer Pro sandboxes
- Metadata in a given environment is your release artifact
- Environments directly correspond to a release path
- The release path for every change is the same
- Overcrowded environments exist
- Risky configuration changes are made directly in production
- Releases occur during peak business hours
Testing Strategy Within your business:
- Usability tests employ a variety of devices and assistive technology
- Simulators are used to replicate production-like conditions for scalability and performance testing
- Tests are automated to run when changes come into source control
- Endurance, stress, performance, and scale tests are run at several intervals in the application development cycle and considered on-going tasks
- You include scale testing as part of your QA process when you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests are focused on priority aspects of the system
- Your scale tests have well-defined criteria
- You conduct scale testing in a Full sandbox
- Prompt engineering includes a quality review by a human
Within your business:
- Usability tests are not conducted, or are conducted on a limited set of devices
- Production-like volumes of user requests, API traffic, and variations in network speed are not tested
- Test automation is not in place
- Endurance, stress, performance, scale tests are considered a phase or stage of development
- You don't conduct scale tests as a part of your QA process and you have B2C-scale apps, large volumes of users, or large volumes of data
- Your scale tests aren't prioritized
- Your scale tests don't have well-defined criteria
- You conduct scale tests in a Partial Copy or Developer sandbox
- Prompt engineering lacks a quality review by a human
In your org:
- All test data is scrubbed of sensitive and identifying data
In your org:
- Test data is identical to production data
In Apex:
- Data factory patterns are used for unit tests
- Mock/stubs are used to simulate API responses
In Apex:
- Your unit tests are reliant on org data
- Mocks/stubs are not used
In your design standards and documentation:
- Environments are classified by what type of tests they can support
- Appropriate test regimes are specified according to risk, use case, or complexity
In your design standards and documentation:
- It is not clear which environment can support what type of tests
- Test regimes are not categorized by risk, use case, or complexity

Incident Response

In security and site reliability engineering (SRE), incident response is focused on how teams identify and address events impacting the overall availability or security of a system, as well as how teams work to address root causes and prevent future issues. Incident response involves the processes and tools as well as the organizational behaviors required to address issues in real-time and in the period after an issue occurs.

As an architect, you may not be the person monitoring your solution’s operations on a day-to-day basis once it goes live. Part of architecting for resilience is designing capabilities that enable support teams to perform first-level diagnosis, stabilize systems, and effectively hand over the investigation and root cause mitigation to development or maintenance teams. Teams directly supporting users on a day-to-day basis may not have deep understanding or expertise in the architecture of the system. It is essential for these teams to have appropriate tools and processes for monitoring daily operations, accessing information from the system when diagnosing a potential incident, and helping them serve as effective first-responders for any issues impacting availability.

You can improve how well teams respond to incidents in your Salesforce solutions by focusing on time to recover, ability to triage, as well as monitoring and alerting.

Time to Recover

When an incident occurs, the first priority must be restoring systems to a stable operational state. Often, businesses think the only way to recover from an incident is to “fix the problem”. This is directionally sound — accurate root cause analysis and remediation is how you ultimately resolve critical issues in a system. However, this approach is not the most practical in the early stages of crisis response. Depending on the severity of an incident, every second of an outage or incident could create revenue (or reputation) loss for the business.

Often, attempting to diagnose and address root causes will delay efforts to restore the system to operation. Logistically, adopting an approach that asks incident responders to address root causes puts tremendous strain on subject matter experts (SMEs) and support staff at your company. Working to find and fix root causes during an incident requires SMEs to be on-call for every incident, and can block front-line/customer-facing support staff from taking action. It can also result in teams releasing changes that, in turn, create in more incidents. Ultimately, such an approach increases costs, consumes bandwidth across teams, and creates behaviors in times of crisis that can erode customer trust and brand reputation.

The right incident management paradigm is to prioritize and focus on recovery as a first step. After the system is restored to stability, then follow up with blameless postmortems, incident investigations, root cause remediation, and similar activities. This order of operations better enables incident response staff to triage, diagnose, and execute recovery tactics, alerting relevant SMEs to assist only as necessary. It also enables SMEs to identify and fix root causes with less pressure from a ticking incident clock.

To adopt a recovery-first mindset to incident response, consider:

Incident Type Apparent Trigger Recovery Tactics
System outage Corrupted logins or issues with account access Carry out account recovery policy
Service unavailable Activate redundant/backup service, Manual workarounds
Production bug Recent change Deployment rollback or prior version de-deploy
Emergent / unexplained bug Manual workarounds, disable non-essential features, escalate to SMEs

The list of patterns and anti-patterns below shows what architecting to prioritize recovery looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools to help with time to recover, see Tools Relevant to Resilient.

Ability to Triage

In the context of technology, triage involves assigning categories and levels of severity to issues and support requests. No matter how well planned your solution is, user support issues and requests will arise. These can range from issues that stem from lack of sufficient training or change management, gaps in UI/UX, and unexpected end-user behaviors, to urgent system issues not caught by monitoring or alerting.

Support and operations teams need the ability to investigate user support queries efficiently and diagnose them quickly. Triaging issues to filter out less severe concerns and quickly spot critical system incidents is a key competency for these teams. Poor triaging slows all levels of user support, prolongs critical incidents, and increases the risk of further disruptions to your customers and your business.

Although you may not be involved in day-to-day operation and support, as an architect, it is your responsibility to help ensure support and operations teams can effectively triage issues in any solution you create on the Salesforce platform.

To enable teams to effectively triage issues within your Salesforce solutions, consider:

The list of patterns and anti-patterns below shows what architecting for effective triaging looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools to help with triage, see Tools Relevant to Resilient.

Monitoring and Alerting

Monitoring and alerting are widely used terms in site reliability engineering. In the context of system resiliency, monitoring is the ability to continuously assess the current state of a system and alerting is the ability to automate notifications to stakeholders about potential concerns about the state of the system. Effective monitoring and alerting is a key part of decoupling the scale and growth of your system from the scale and growth of your support staff.

Salesforce provides a variety of built-in capabilities to monitor behaviors in your system. Salesforce also offers real-time event monitoring as an add-on or as part of Salesforce Shield. In any Salesforce solution, designs architected for monitoring and alerting provide:

To architect for effective monitoring and alerting within your Salesforce solutions, consider:

The list of patterns and anti-patterns below shows what architecting for effective monitoring and alerting looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify areas of your system that need to be refactored.

To learn more about Salesforce tools for monitoring and alerting, see Tools Relevant to Resilient.

Incident Response Patterns and Anti-Patterns

The following table shows a selection of patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.

✨ Discover more patterns for incident response in the Pattern & Anti-Pattern Explorer.

Patterns Anti-Patterns
Time to Recover Within your business:
- Recovery protocols are practiced on regular intervals
- Teams know what services in production they are responsible for owning
Within your business:
- Recovery protocols don't exist or aren't practiced on regular intervals
- It is unclear what teams are responsible for different services in production
In your documentation:
- Recovery tactics are defined and classified by incident type and trigger
- Exit criteria for incident responses exist in your SLOs and are clear
- Activation criteria and assignment logic for elevated permissions during incidents are clear
- Incident response permission sets and authorizations are clearly listed
In your documentation:
- Incident response is performed ad hoc
- Exit criteria for incident responses do not exist
- Elevated permissions are not assigned, or assigned ad hoc
- Incident response permission sets and authorizations are not listed
In your org:
- Session-based permission sets for incident response exist and can be assigned to support staff during recovery
- Setup Audit Trail shows designated recovery testers have logged into testing environment on agreed upon time and have followed recovery test scripts
In your org:
- Session-based permission sets do not exist for incident response, or are not authorized for support staff to use
- Setup Audit Trail shows designated recovery testers have not logged into the testing environment or did not follow recovery test scripts
In your test plans:
- Test scripts for recovery testing exist and are repeatable
- Environments for incident simulations are clearly listed
In your test plans:
- Test scripts do not exist for recovery testing
- Environments are not established for incident simulations
Ability to Triage Within your business:
- SMEs or stakeholders who should be alerted to support complex issues are identified before an incident occurs
- The hand-off between delivery and support teams is a part of go-live
- If consulted, Salesforce architects respond quickly and help the team stay focused on recovery
Within your business:
- SMEs or stakeholders who should be alerted aren't identified until an incident occurs
- The hand-off between delivery teams and support teams isn't a part of the release process
- Salesforce architects consider incident response to be outside their scope of work
In your documentation:
- System and design patterns used in a given solution are discoverable and readable by support staff
In your documentation:
- System and design patterns used in a given solution are not readily available to support staff
In your org:
- Logging and custom error messages are incorporated into execution paths throughout the system
In your org:
- Logging and custom error messages are not used
Monitoring and Alerting In your org:
- Alerts are only used to inform users of scenarios that require human intervention; other failures are logged and reportable
- Alerts are sent to users who are capable of responding to them
- When possible, alerts are delivered in advance of a potential failure
In your org:
- Alerts are sent when any type of failure occurs, regardless of whether follow-on actions are required
- Alerts about issues requiring technical solutions are delivered to business users
- Alerts are only delivered in response to failures that have already occurred
In your documentation:
- Entry criteria for prompt tuning alerts are defined based on direct and indirect generative AI feedback metrics
In your documentation:
- There are no criteria defined for triggering prompt tuning alerts for generative AI apps

Continuity Planning

A key to business resilience is continuity planning, which focuses on how to enable people and systems to function through issues caused by an unplanned event. Business continuity plans (BCPs) take a people-oriented view of how to keep processes moving forward through crisis. Technical aspects of continuity planning are contained in the disaster recovery portions of a BCP. For more on this topic, see Technology Continuity.

Without adequate continuity plans, your organization may be paralyzed in the event of a crisis or system outage. Ineffective continuity planning can have catastrophic impact on customers, stakeholders, and business. In the wake of an adverse event, each moment that passes without maintaining or recovering critical processes risks financial damage, reputational damage, employee safety, and even regulatory compliance.

You can build better continuity planning into your systems by focusing your efforts in three areas: defining business continuity for Salesforce, planning for technology continuity, and building backup and restore capabilities.

Business Continuity

Your company may already have a BCP in place. If this is the case, make sure Salesforce is included. If your company doesn’t have a BCP, work with your stakeholders to create one that covers your Salesforce org(s).

Salesforce will likely play a unique role in business continuity plans, because of the role it occupies in the system landscape. Salesforce is often relied upon to be a source of truth for customer data and essential business processes, across many business divisions. As such, the role Salesforce plays in a BCP may differ from other systems. It is likely that Salesforce will be involved in many high-priority areas for recovery.

To create relevant business continuity planning for Salesforce systems, consider:

The list of patterns and anti-patterns below shows what proper (and poor) continuity planning looks like for a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about Salesforce tools for defining business continuity, see Tools Relevant to Resilient.

Technology Continuity

The goal of technology continuity is to make sure the business won’t be prevented from maintaining essential operations due to issues with the components in a system. Salesforce prioritizes maintaining our services at the highest levels of availability, and providing transparent information about any issues. You can see real-time information about Salesforce system performance and issues at trust.salesforce.com. As an architect building on Salesforce, your solutions benefit from the site reliability, security, and performance capabilities that Salesforce provides across the entire platform.

However, the overall continuity of your Salesforce solutions extends beyond the built-in services Salesforce provides. From an architectural perspective, Salesforce technology continuity planning has to begin with asking (and answering) questions about how Salesforce fits into your larger enterprise landscape. What kind of systems integrate with Salesforce? How do external systems depend on processes or information in Salesforce? In your Salesforce orgs, what processes or functionality rely on AppExchange solutions? Do your users access Salesforce through third-party identity services or SSO?

To build better technology continuity in your Salesforce systems, consider:

Treat any items that come out your post-incident reviews like other development items, and add them to your planning systems to be prioritized and worked on.

The list of patterns and anti-patterns below shows what proper (and poor) technology continuity planning looks like within a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about Salesforce tools for technology continuity planning, see Tools Relevant to Resilient.

Building Backup and Restore Capabilities

Restoring backed-up copies of data or metadata can help return your org to its last known stable state and provide a failover system that can be used in the event of a catastrophic system failure or outage. Backing up your data and metadata regularly and storing your encrypted, backed-up copies in a secure location adds an additional layer of resilience to your architecture.

Without backup and restore strategies you will not be able to restore clean versions of your production data and metadata when data is maliciously corrupted, when defects inadvertently make their way into production, or when a failure during a large data load corrupts production data. Any one of these scenarios can result in your business-critical production data becoming corrupt or even permanently lost. Setting up backup and restore technology offers a number of advantages aside from continuity planning, including assisting with large data volume mitigation strategies, adhering to compliance-related retention policies, and more.

To help ensure continuity with backup and restore strategies in your Salesforce solutions, consider:

You may require a more granular backup strategy if your data volumes are so large that a full backup doesn’t have time to complete before the next backup starts running or if your organization’s data changes so frequently that the updates are mission-critical to your organization.

Here are ways to shape a more granular backup strategy:

Always continue to perform full backups. It’s important to note that you should never eliminate full backups completely, even if data volumes result in long run times. In the case of large data volumes, plan for regular, but infrequent full backups (weekly for example) in conjunction with more frequent partial or object-specific backups (nightly or every X number of hours, for example). This will give you the flexibility to reconstruct the most complete and accurate dataset to use in your restore processes.

The list of patterns and anti-patterns below shows what proper (and poor) backup and restore capabilities look like within a Salesforce solution. You can use these to validate your designs before you build, or identify places in your system that need to be refactored.

To learn more about Salesforce tools for backup and restore, see Tools Relevant to Resilient.

Continuity Planning Patterns and Anti-Patterns

The following table shows a selection of patterns to look for (or build) in your org and anti-patterns to avoid or target for remediation.

✨ Discover more patterns for continuity planning in the Pattern & Anti-Pattern Explorer.

Patterns Anti-Patterns
Business Continuity Within your business:
- A "recovery first" mindset is adopted with a focus on bringing the highest priority business functions and capabilities out of impact as soon as possible
- There is a maintenance schedule for the review of BCP test plans
Within your business:
- A "fix-the-problem" mentality is the only approach to incident management
- BCP test plans are not refreshed at regular intervals
In your documentation:
- A BCP exists containing: steps to continue processing or triage data if Salesforce becomes unavailable, a list of events that can trigger the use of the BCP, steps and intervals for BCP testing
- Your BCP includes upstream and downstream systems and dependencies
In your documentation:
- A BCP does not exist, is incomplete, or includes only Salesforce
In your test plans:
- The areas of your BCP related to processes and people are accounted for
In your test plans:
- The areas of your BCP related to processes and people are not accounted for
Technology Continuity Within your business:
- You have evaluated if you need to build intentional redundancy or fail-over systems
- Incident recovery tactics are automated wherever possible
Within your business:
- You have not evaluated the need for intentional redundancy or fail-over systems
- Incident recovery tactics are all manual
In your documentation:
- Your BCP accounts for additional resources or break-glass procedures teams might need to respond to incidents effectively
In your documentation:
- Your BCP does not include operational support needs
Backup and Restore In your documentation:
- A backup and restore strategy exists for both data and metadata
In your documentation:
- A backup and restore strategy does not exist or the strategy is incomplete (it applies to only data or metadata, not both)
At your company:
- Backups are stored in a secure location accessibly by only authorized users
- Test plans and test logs show data restores are tested in a full or partial copy sandbox at least two times each year
At your company:
- Backups are not human readable
- Backups are stored in locations that unauthorized business users can access
- There is no data restoration process or the data restoration process is untested

Tools Relevant to Resilient

ToolDescriptionApplication Lifecycle ManagementIncident ResponseContinuity Planning
Apex Hammer TestsLearn about Salesforce Apex testing in current and new releasesX
Apex Stub APIBuild a mocking framework to streamline testingX
Backup and RestoreAutomatically generate backups to prevent data lossX
Big ObjectsStore and manage large volumes of data on-platformX
Field History TrackingTrack and display field historyX
Get Adoption and Security Insights for Your OrganizationMonitor adoption and usage of Lightning Experience in your orgX
Manage Bulk Data Load JobsCreate update, or delete large volumes of records with the Bulk APIX
Manage Real-Time Event Monitoring EventsManage event monitoring streaming and storage settingsX
Monitor Data and Storage ResourcesView your Salesforce org’s storage limits and usageX
Monitor Debug LogsMonitor logs and set flags to trigger loggingX
Monitor Login Activity with Login ForensicsIdentify behavior that may indicate identity fraudX
Monitor Setup Changes with Setup Audit TrailTrack recent setup changes made by adminsX
Monitor Training HistoryView the Salesforce training classes your users have takenX
Monitoring Background JobsMonitor background jobs in your organizationX
Monitoring Scheduled JobsView report snapshots, scheduled Apex jobs and dashboard refreshesX
Performance AssistantTest system performance and interpret the resultsX
Proactive MonitoringMinimize disruptions with Salesforce monitoring servicesX
Salesforce Data MaskAutomatically mask data in a sandboxX
The System Overview PageView usage data and limits for your organizationX
Use force:lightning:lintAnalyze and validate code via the CLIX

Resources Relevant to Resilient

ResourceDescriptionApplication Lifecycle ManagementIncident ResponseContinuity Planning
7 Anti-Patterns in Performance and Scale TestingAvoid common anti-patterns in performance and scale testingX
Analyze Performance & Scale Hotspots in Complex Salesforce AppsAn approach to address performance and scalability issues in your orgX
Build a Disaster Recovery Plan (Trailhead)Build a disaster recovery planX
Business Continuity is More than Backup and RestoreTake a comprehensive view of BCPX
Design Standards TemplateCreate design standards for your organizationX
Diagnostics and Monitoring tools in SalesforceLearn how to improve the quality and performance of your implementationsX
Guiding Principles for Continuity PlanningReview the basic principles underlying effective BCPX
How to Scale Test on SalesforceApproach scale testing in five stepsX
Introduction to Business Continuity Planning for Architects (Trailhead)Get started with business continuity planningX
Introduction to Performance TestingLearn how to develop a performance testing methodX
Monitor Your OrganizationLearn about self service monitoring optionsX
Scale Test Strategy ChecklistCreate and customize scale and performance test plansX
Site Reliability Engineering At SalesforceLearn what Salesforce SRE does and how they do itX
Test Strategy TemplateEnsure completeness of your test strategyX
Understand Source Driven Development (Trailhead)Learn about package development and scratch orgsX

Tell us what you think

Help us keep Salesforce Well-Architected relevant to you; take our survey to provide feedback on this content and tell us what you’d like to see next.