0% completed
Problem Statement
You have a gene sequence
represented by an 8-character
string, containing only 'A'
, 'C'
, 'G'
, or 'T'
characters. To change from one gene sequence to another, you can mutate one character at a time.
For example, "AAAAACCC" --> "AAAACCCC"
is one mutation.
Each intermediate sequence must be a valid
gene in the provided list (bank).
The gene bank bank
records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank
, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.
Note: The starting point is assumed to be valid, so it might not be included in the bank.
Examples
Example 1:
- Input:
startGene
: "AACCGGTT",endGene
: "AAACGGTA",bank
: ["AACCGGTA", "AACCGCTA", "AAACGGTA"] - Expected Output: 2
- Justification:
- Mutations: "AACCGGTT" → "AACCGGTA" → "AAACGGTA"
Example 2:
- Input:
startGene
: "AAAAACCC",endGene
: "AACCCCCC",bank
: ["AAAACCCC", "AAACCCCC", "AACCCCCC"] - Expected Output: 3
- Justification:
- Mutations: "AAAAACCC" → "AAAACCCC" → "AAACCCCC" → "AACCCCCC"
Example 3:
- Input:
startGene
: "AACCGGTT",endGene
: "CCGGTTAA",bank
: ["AACCGGTA", "AACCGCTA", "AAACGGTA", "CCGGTTAA"] - Expected Output: -1
- Justification: We need to mutate a total of 8 characters, but the 'bank' array contains only 4 strings. So, it is not possible to perform 8 mutations.
Constraints:
- 0 <= bank.length <= 10
- startGene.length == endGene.length == bank[i].length == 8
- startGene, endGene, and bank[i] consist of only the characters ['A', 'C', 'G', 'T'].
Solution
To solve this problem, we can use a breadth-first search (BFS) approach. BFS is ideal for finding the shortest path in an unweighted graph, which in this case, is represented by gene sequences as nodes and mutations as edges. Starting from startGene, we will explore all possible single mutations and use a queue to keep track of the current gene sequence and the number of mutations. We will use a set to store the bank for quick lookups and another set to track visited sequences to avoid cycles.
This approach ensures we explore all possible paths efficiently and find the shortest mutation sequence if one exists.
Step-by-Step Algorithm
-
Initialization:
- Convert the bank array into a set for O(1) look-up.
- Create a queue for BFS and initialize it with a tuple containing the startGene and a mutation count of 0.
- Create a set to keep track of visited genes to avoid revisiting them.
-
BFS Traversal:
- While the queue is not empty:
- Dequeue the front element, which includes the current gene sequence and the mutation count.
- If the current gene sequence matches the endGene, return the mutation count as the result.
- For each position in the gene sequence:
- Save the original character at this position.
- Replace this character with each of 'A', 'C', 'G', and 'T':
- Form a new gene sequence by replacing the character.
- If the new gene sequence is in the bank set and has not been visited:
- Mark this new gene sequence as visited.
- Enqueue this new gene sequence along with an incremented mutation count.
- Restore the original character at this position.
- While the queue is not empty:
-
Return Statement:
- If the queue is exhausted and no mutation path is found, return -1.
Algorithm Walkthrough
Using the following input:
- startGene: "AACCGGTT"
- endGene: "AAACGGTA"
- bank: ["AACCGGTA", "AACCGCTA", "AAACGGTA"]
-
Initialization:
- bankSet: {"AACCGGTA", "AACCGCTA", "AAACGGTA"}
- queue: [("AACCGGTT", 0)]
- visited: {}
-
First Iteration:
- Queue before processing: [("AACCGGTT", 0)]
- Dequeue ("AACCGGTT", 0)
- Current gene: "AACCGGTT"
- Mutation count: 0
- For each character in "AACCGGTT":
- Position 0:
- Try 'A', 'C', 'G', 'T' (no change for 'A')
- Position 1:
- Try 'A', 'C', 'G', 'T' (no change for 'A')
- Position 2:
- Try 'A', 'C', 'G', 'T' (change to "AAGCGGTT", "AAACGGTT", "AATCGGTT")
- "AAGCGGTT", "AAACGGTT", "AATCGGTT" is not in the bankSet
- Position 3:
- Try 'A', 'C', 'G', 'T' (change to "AACCAGTT", "AACCCGTT", "AACCTGTT")
- "AACCAGTT", "AACCCGTT", "AACCTGTT" is not in the
bank
set.
- Position 4:
- Try 'A', 'C', 'G', 'T' (no change for 'G')
- Position 5:
- Try 'A', 'C', 'G', 'T' (no change for 'G')
- Position 6:
- Try 'A', 'C', 'G', 'T' (no change for 'T')
- Position 7:
- Try 'A', 'C', 'G', 'T'
- "AACCGGTA" is in the bankSet and not visited
- Enqueue ("AACCGGTA", 1)
- Mark "AACCGGTA" as visited
- Position 0:
- Queue after processing: [("AACCGGTA", 1)]
- Visited: {"AACCGGTA"}
-
Second Iteration:
- Queue before processing: [("AACCGGTA", 1)]
- Dequeue ("AACCGGTA", 1)
- Current gene: "AACCGGTA"
- Mutation count: 1
- For each character in "AACCGGTA":
- Position 0:
- Try 'A', 'C', 'G', 'T' (no change for 'A')
- Position 1:
- Try 'A', 'C', 'G', 'T' (no change for 'A')
- Position 2:
- Try 'A', 'C', 'G', 'T' (change to "AAGCGGTA", "AAACGGTA", "AATCGGTA")
- "AAACGGTA" is in the bankSet and not visited
- Enqueue ("AAACGGTA", 2)
- Mark "AAACGGTA" as visited
- Position 3:
- Try 'A', 'C', 'G', 'T' (no change for 'C')
- Position 4:
- Try 'A', 'C', 'G', 'T' (no change for 'G')
- Position 5:
- Try 'A', 'C', 'G', 'T' (no change for 'G')
- Position 6:
- Try 'A', 'C', 'G', 'T' (no change for 'T')
- Position 7:
- Try 'A', 'C', 'G', 'T' (no change for 'A')
- Position 0:
- Queue after processing: [("AAACGGTA", 2)]
- Visited: {"AACCGGTA", "AAACGGTA"}
-
Third Iteration:
- Queue before processing: [("AAACGGTA", 2)]
- Dequeue ("AAACGGTA", 2)
- Current gene: "AAACGGTA"
- Mutation count: 2
- Check if the current gene matches the endGene:
- It matches "AAACGGTA"
- Return 2 as the result
The minimum number of mutations needed to change from "AACCGGTT" to "AAACGGTA" using the given bank is 2.
Code
Complexity Analysis
Time Complexity
- Initialization of the bank set and visited set: O(N), where N is the number of genes in the bank.
- BFS Traversal:
- Each gene string has a fixed length of 8 characters.
- For each gene string, we try changing each of the 8 characters to 'A', 'C', 'G', or 'T' (4 possibilities).
- Checking if a mutated gene is in the bank set takes O(1) time.
- Therefore, each gene has O(8 * 4) = O(32) possible mutations, which simplifies to O(1) because it's a constant.
- In the worst case, we might visit all N genes.
- Hence, the total time complexity is O(N).
Space Complexity
- Storage for the bank set and visited set: O(N)
- Queue for BFS traversal: O(N)
- Storage for the gene strings in the queue: O(N * 8), which simplifies to O(N) because the length of each gene is constant (8).
- Therefore, the total space complexity is O(N).
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible