Review PRICKLE4 by nvaulin · Pull Request #25 · Python-BI-2023/Peer_review

nvaulin · 2024-02-26T17:57:09Z

Review PRICKLE4

anshtompel · 2024-03-08T15:22:03Z

PRICKLE4.py

+    def transcribe(self):
+        """
+        Function return transcript of DNA sequence
+        """
+        transcribe = self.seq.maketrans(self.dict_trans)
+        res = self.seq.translate(transcribe)


Усложнили немного)
Тут можно обойтись просто заменой T на U, вместо использования отдельного алфавита под транскрипцию

Но в целом так тоже окей)

anshtompel · 2024-03-08T15:26:15Z

PRICKLE4.py

+    alphabet = {'M', 'O', 'v', 'D', 'f', 'N', 'c', 'A', 'R', 'W', 'I', 'm', 'L', 's', 'H', 'q', 'w', 'V', 'n', 'i',
+                   'g', 'F', 'S', 'e', 'l', 'U', 'P', 'Q', 'K', 'Y', 'u', 'y', 'd', 'h', 'k', 'r', 't', 'G', 'o', 'E',
+                   'p', 'T', 'C', 'a'}
+    masses = {'A': 71.08, 'R': 156.2, 'N': 114.1, 'D': 115.1, 'C': 103.1, 'E': 129.1, 'Q': 128.1, 'G': 57.05, 'H': 137.1,
+              'I': 113.2, 'L': 113.2, 'K': 128.2, 'M': 131.2, 'F': 147.2, 'P': 97.12, 'S': 87.08, 'T': 101.1,
+              'W': 186.2, 'Y': 163.2, 'V': 99.13, 'U': 168.05, 'O': 255.3, 'a': 71.08, 'r': 156.2, 'n': 114.1, 'd': 115.1,
+              'c': 103.1, 'e': 129.1, 'q': 128.1, 'g': 57.05, 'h': 137.1, 'i': 113.2, 'l': 113.2, 'k': 128.2, 'm': 131.2,
+              'f': 147.2, 'p': 97.12, 's': 87.08, 't': 101.1, 'w': 186.2, 'y': 163.2, 'v': 99.13, 'u': 168.05, 'o': 255.3}


Такое количество алфавитов кажется избыточным. Если вы хотите проверить последовательность и определить массу, вы можете использовать всего один словарь)

anshtompel · 2024-03-08T15:26:37Z

PRICKLE4.py

+        """
+        Function return complement sequence
+        """
+        complementary_dna = self.seq.maketrans(self.dict_comp)


Крутой метод) возьму на заметку

anshtompel · 2024-03-08T15:37:30Z

PRICKLE4.py

+    @abstractmethod
+    def to_print_seq(self):
+        return self.seq
+        pass
+


Абстрактные методы в абстрактном классе определяют некий интерфейс, который будет унаследован дочерними классами. Тут подразумевалось, что в родительском классе Biological sequence вы скажите, что дочерние классы должны уметь считать длину, получать индексы и прочее. Например, про длину, это можно записать как

Suggested change

@abstractmethod

def to_print_seq(self):

return self.seq

pass

@abstractmethod

def __len__(self):

pass

И потом переопределить нашу заглушку в дочернем, написав что-то осмысленное вместо pass
Как-то так)

anshtompel · 2024-03-08T16:02:36Z

PRICKLE4.py

+        for nucl in self.seq:
+            if nucl == 'c' or nucl == 'g' or nucl == 'C' or nucl == 'G':
+                n += 1
+        return 100 * n / len(self.seq)


Вместо цикла можно использовать методы строк, типа

Suggested change

for nucl in self.seq:

if nucl == 'c' or nucl == 'g' or nucl == 'C' or nucl == 'G':

n += 1

return 100 * n / len(self.seq)

n = self.seq.upper().count('G') + self.seq.upper().count('C')

return 100 * n / len(self.seq)

либо использовать regex :)

anshtompel · 2024-03-08T16:15:29Z

PRICKLE4.py

+        return RNASequence(res)
+
+
+class RNASequence(BiologicalSequence):


То же самое наследование от NucleicAcidSequence

anshtompel · 2024-03-08T16:17:58Z

PRICKLE4.py

+        transcribe = self.seq.maketrans(self.dict_trans)
+        res = self.seq.translate(transcribe)
+
+        return RNASequence(res)


Круто, что предусмотрели, что возвращается объект другого класса!

anshtompel · 2024-03-08T16:25:55Z

PRICKLE4.py

+    if not os.path.isdir('fastq_filtrator_resuls'):
+        os.mkdir('fastq_filtrator_resuls')


Папка для результатов - отличное решение)

anshtompel · 2024-03-08T16:32:02Z

PRICKLE4.py

+        length_bounds_both_side = (0, length_bounds)
+    else:
+        length_bounds_both_side = length_bounds
+    records = list(SeqIO.parse(input_fastq, "fastq"))


Думаю, так тоже рабочий вариант) Или же еще можно итерироваться по объектам SeqIO, и например, скормить только последовательность в виде seq_record.seq функцию filter_lenghth, кроме того метод .letter_annotations сам по себе спокойно работает с классом SeqRecord)

anshtompel · 2024-03-08T16:38:04Z

PRICKLE4.py

+    SeqIO.write((record for record in filtered_records_q), output_fastq, "fastq")
+    return


Работа хорошая, мне понравилась) что то взяла себе на заметку. Удачи!

IuriiSl

В целом работа хорошая. Немножко подправить абстрактные методы. Из-за путаницы в наследовании у ДНК и РНК не будет работать проверка алфавита. Есть парочка интересных решений)

IuriiSl · 2024-03-09T17:56:36Z

PRICKLE4.py

+class BiologicalSequence(ABC, str):
+
+    def __init__(self, seq):
+        self.seq = seq


Абстрактный класс должен содержать только абстрактные методы, которые определяют, что должно быть в дочерних классах

IuriiSl · 2024-03-09T18:18:30Z

PRICKLE4.py

+        complementary_dna = self.seq.maketrans(self.dict_comp)
+        res = self.seq.translate(complementary_dna)


Интересно реализовано

IuriiSl · 2024-03-09T18:19:12Z

PRICKLE4.py

+    alphabet = {'U', 'A', 'g', 'G', 'a', 'c', 'C', 'u'}
+    dict_comp = {'A': 'U', 'C': 'G', 'U': 'A', 'G': 'C', 'a': 'u', 'c': 'g', 'u': 'a', 'g': 'c'}


Хорошо, что предусмотрены регистры во всех алфавитах

IuriiSl · 2024-03-09T18:24:14Z

PRICKLE4.py

+        res = self.seq.translate(complementary_dna)
+        return NucleicAcidSequence(res)
+
+class DNASequence(BiologicalSequence):


По идее не должно работать, если бы абстрактный класс содержал абстрактные методы. И получается, что проверка алфавита отсутствует

IuriiSl · 2024-03-09T18:26:21Z

PRICKLE4.py

+        return RNASequence(res)
+
+
+class RNASequence(BiologicalSequence):


То же, что для DNASequence

IuriiSl · 2024-03-09T18:31:04Z

PRICKLE4.py

+def filter_gc(records, gc_bounds_both_side=(0, 100)) -> list:
+    """
+    This function selects sequences with the GC content of your interest
+    :parameters:
+        records: records from fastq parced by SeqIO
+        gc_bound: interval for the of acceptable GC content, in %
+    :return:(dict) new dictionary consists of selected sequences
+    """


Видимо, забыли исправить докстрингу. Функция возвращает список, а не словарь.

IuriiSl · 2024-03-09T18:31:29Z

PRICKLE4.py

+    for record in records:
+        if (length_bounds_both_side[1] >= len(record.seq) >= length_bounds_both_side[0]):
+            new_records.append(record)
+    print(new_records)


Suggested change

print(new_records)

IuriiSl · 2024-03-09T18:31:55Z

PRICKLE4.py

+    for record in records:
+        if (sum(record.letter_annotations["phred_quality"])/len(record.seq) >= quality_threshold):
+            new_records.append(record)
+    print(new_records)


Suggested change

print(new_records)

zmitserbio

В целом, работа неплохая, отдельные моменты отметил себе на будущее. Однако есть проблемы с не совсем корректной реализацией классов.
Еще, стоит прогнать код через линтер - периодически количество пустых строк не соответствует требуемому правилами. И лучше сделать аннотации типов.

zmitserbio · 2024-03-10T07:22:05Z

PRICKLE4.py

+from Bio import SeqUtils
+
+
+class BiologicalSequence(ABC, str):


Здесь не стоило наследоваться от строки, все же это абстрактный класс, ну и вообще, как оказалось, наследование от базового класса здесь не предполагалось.

zmitserbio · 2024-03-10T07:23:10Z

PRICKLE4.py

+
+    def __init__(self, seq):
+        self.seq = seq


Как сама автор кода указывает ниже, здесь стоило оставить только абстрактные методы. Тот же комментарий к методам ниже.

zmitserbio · 2024-03-10T07:46:15Z

PRICKLE4.py

+    # не поняла, не доделала
+    @abstractmethod
+    def to_print_seq(self):
+        return self.seq
+        pass


Здесь, если я правильно понимаю, предполагалась реализация "вывода на печать в удобном виде"? В таком случае, стоило это сделать через str. Вообще, как я понимаю, предполагалось, что наш объект имеет атрибут, в котором лежит строка с последовательностью. Тогда можно было бы сделать так:

Suggested change

# не поняла, не доделала

@abstractmethod

def to_print_seq(self):

return self.seq

pass

def __str__(self):

return self.seq

Как я понимаю, здесь логика была иной, учитывая наследование от строки. Такой объект тоже можно было бы перевести в тип строка, например, итерируясь по нему, добавляя элементы в list, а затем переведя лист в строку. Это не очень осмысленно, но если нужен именно тип строка, так можно было бы сделать.
Повторюсь только, что в абстрактном классе делать этого не стоило.

zmitserbio · 2024-03-10T07:49:50Z

PRICKLE4.py

+class UnexpectedSymbolInSeqError(ValueError):
+    pass


Приятно, что сделана специальная ошибка.

zmitserbio · 2024-03-10T07:57:39Z

PRICKLE4.py

+    alphabet = {'M', 'O', 'v', 'D', 'f', 'N', 'c', 'A', 'R', 'W', 'I', 'm', 'L', 's', 'H', 'q', 'w', 'V', 'n', 'i',
+                   'g', 'F', 'S', 'e', 'l', 'U', 'P', 'Q', 'K', 'Y', 'u', 'y', 'd', 'h', 'k', 'r', 't', 'G', 'o', 'E',
+                   'p', 'T', 'C', 'a'}
+    masses = {'A': 71.08, 'R': 156.2, 'N': 114.1, 'D': 115.1, 'C': 103.1, 'E': 129.1, 'Q': 128.1, 'G': 57.05, 'H': 137.1,
+              'I': 113.2, 'L': 113.2, 'K': 128.2, 'M': 131.2, 'F': 147.2, 'P': 97.12, 'S': 87.08, 'T': 101.1,
+              'W': 186.2, 'Y': 163.2, 'V': 99.13, 'U': 168.05, 'O': 255.3, 'a': 71.08, 'r': 156.2, 'n': 114.1, 'd': 115.1,
+              'c': 103.1, 'e': 129.1, 'q': 128.1, 'g': 57.05, 'h': 137.1, 'i': 113.2, 'l': 113.2, 'k': 128.2, 'm': 131.2,
+              'f': 147.2, 'p': 97.12, 's': 87.08, 't': 101.1, 'w': 186.2, 'y': 163.2, 'v': 99.13, 'u': 168.05, 'o': 255.3}


Тут стоит отметить, что хотя решение не является неправильным, alphabet здесь не обязательная переменная (кстати, переменные стоит писать капсом). Ведь можно использовать set(masses.keys()). Это, конечно, не ошибка, но на будущее, это как будто несколько более экономно, не хранить избыточную информацию (особенно учитывая, что в реальных примерах она может весить гигабайты).

zmitserbio · 2024-03-10T08:27:56Z

PRICKLE4.py

+    :parameters:
+        records: records from fastq parced by SeqIO
+        gc_bound: interval for the of acceptable GC content, in %
+    :return:(dict) new dictionary consists of selected sequences


Следует поправить тип возвращаемого объекта.

zmitserbio · 2024-03-10T08:29:17Z

PRICKLE4.py

+    :parameters:
+        records: records from fastq parced by SeqIO
+        length_bound: interval for the of acceptable sequense length in number of nucleotide
+    :return:(dict) new dictionary consists of selected sequences


Аналогично.

zmitserbio · 2024-03-10T08:30:22Z

PRICKLE4.py

+    parameters:
+        seqs: dictionary of FASTQ sequences {name: (sequence, quality)}
+        quality_treshold: threshold value for average quality per nucleotide (phred33 scale)
+    :return:(dict) recordes for selected sequences


И еще раз.

zmitserbio · 2024-03-10T08:32:56Z

PRICKLE4.py

+    for record in records:
+        if (length_bounds_both_side[1] >= len(record.seq) >= length_bounds_both_side[0]):
+            new_records.append(record)
+    print(new_records)


Не очень понятно, почему здесь принт... Видимо, забыли убрать?

zmitserbio · 2024-03-10T08:33:31Z

PRICKLE4.py

+    for record in records:
+        if (sum(record.letter_annotations["phred_quality"])/len(record.seq) >= quality_threshold):
+            new_records.append(record)
+    print(new_records)


И снова принт.

Add PRICKLE4.py

35c182f

anshtompel reviewed Mar 8, 2024

View reviewed changes

IuriiSl reviewed Mar 9, 2024

View reviewed changes

zmitserbio reviewed Mar 10, 2024

View reviewed changes

		return RNASequence(res)


		class RNASequence(BiologicalSequence):

		if not os.path.isdir('fastq_filtrator_resuls'):
		os.mkdir('fastq_filtrator_resuls')

		SeqIO.write((record for record in filtered_records_q), output_fastq, "fastq")
		return

		complementary_dna = self.seq.maketrans(self.dict_comp)
		res = self.seq.translate(complementary_dna)

		alphabet = {'U', 'A', 'g', 'G', 'a', 'c', 'C', 'u'}
		dict_comp = {'A': 'U', 'C': 'G', 'U': 'A', 'G': 'C', 'a': 'u', 'c': 'g', 'u': 'a', 'g': 'c'}

Conversation

nvaulin commented Feb 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IuriiSl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zmitserbio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants