Review TAOK3 by nvaulin · Pull Request #23 · Python-BI-2023/Peer_review

nvaulin · 2024-02-26T17:54:42Z

Review TAOK3

nekrasovadasha22

Хорошая работа, код мне понравился, видно, что очень старались! Не поняла только, почему в последней функции итерация идет не с 1 аминокислоты, а со второй.
Константы в питоне принято писать большими буквами, об этом не забываем. Спасибо за работу!

nekrasovadasha22 · 2024-03-06T17:30:07Z

TAOK3.py

+    For more information please see README
+
+    """
+    if type(length_bounds) is int:


Можно еще проверить на None, если вдруг пользователь захочет передать в функцию такое значение параметров и в случае None присвоить значение по умолчанию

Да, об этом не подумала

nekrasovadasha22 · 2024-03-06T17:30:58Z

TAOK3.py

+    if os.path.exists(output_path):
+        error = 'File with such name exists! Change output_filename arg!'
+        raise ValueError(error)
+    SeqIO.write(good_reads, output_path, "fastq")


Хорошая работа!

nekrasovadasha22 · 2024-03-06T17:43:40Z

TAOK3.py

+        gc_counter = 0
+        for nucl in self.seq:
+            if nucl in ('G', 'C', 'g', 'c'):
+                gc_counter += 1


Здесь можно было бы использовать метод .upper() или .lower(), чтобы не сравнивать регистры или вынести эти значения в константы, чтобы каждый раз не создавался новый tuple при каждой итерации

Да, абсолютно согласна)

Не поспорить)

nekrasovadasha22 · 2024-03-06T17:47:00Z

TAOK3.py

+    """
+    The class for RNA sequences
+    """
+    rule_complement = 'AUCG'.maketrans('AaUuCcGg', 'UuAaGgCc')


Интересная реазизация!

Спасибо)

nekrasovadasha22 · 2024-03-06T17:50:26Z

TAOK3.py

+        """
+        transcribe_seq = self.seq.translate(self.rule_transcription)
+        rna_seq = RNASequence(transcribe_seq)
+        return rna_seq


Очень красиво!

nekrasovadasha22 · 2024-03-06T17:51:49Z

TAOK3.py

+    """
+    The class for amino acid sequences
+    """
+    aa_alphabet = 'ACDEFGHIKLMNPQRSTVWY'


Константы обозначаем большими буквами

А вот это точно константы? Я тоже об этом думала, но в данном случае наши константы стали атрибутами. Как будто они всегда записываются в нижнем регистре?

Хм, наверное это вопрос уже вкуса. Мне кажется, что если есть какое-то значение, которое ты задаешь единожды и не изменяешь, то это все-таки константа. А начальные какие-то значения, которые ты передаешь в def init, например - это уже атрибуты, которые ты определяешь.

Константами в питоне называются конкретно штуки вынсенные в начало модуля. Тут это просто классовые атрибуты, поэтому да, просто в нижнем регистре

Обрати внимание, когда пишешь какой то код в PR, то его кушает маркдаун. У тебя __init__ стал жирным init. Чтобы он оставался собой, надо добавлять штрихи (на букве ё).

nekrasovadasha22 · 2024-03-06T18:04:19Z

TAOK3.py

+        """
+        alternative_frames = []
+        num_position = 0
+        for amino_acid in self.seq[1:-3]:


Здесь так и задумано, что мы пропускаем первую аминокислоту?

Да, здесь так задумано) С первой амк начинается стандартная рамка считывания, а мы ищем альтернативные)

Отлично!

uzolotikov

Спасибо огромное за работу, всем бы такой лаконичный код и такие наиподробнейшие докстринги с аннотацией!
10/10

uzolotikov · 2024-03-09T11:58:52Z

TAOK3.py

+    Example: output_filename='result'  # 'result.fastq'
+             output_filename='result.fastq'


Очень хорошая идея с примерами, порой их очень не хватает

uzolotikov · 2024-03-09T12:05:21Z

TAOK3.py

+    if output_filename is None:
+        input_filename = os.path.split(input_path)[-1]
+        output_filename = input_filename
+    if not (output_filename.endswith('.fastq')):
+        output_filename = output_filename + '.fastq'
+    current_directory = os.getcwd()
+    path = os.path.join(current_directory, 'fastq_filtrator_results')
+    if not (os.path.exists(path)):
+        os.mkdir(path)
+    output_path = os.path.join(path, output_filename)
+    if os.path.exists(output_path):
+        error = 'File with such name exists! Change output_filename arg!'
+        raise ValueError(error)
+    SeqIO.write(good_reads, output_path, "fastq")


💥💥💥🔥🔥🔥

uzolotikov · 2024-03-09T12:20:30Z

TAOK3.py

+    """
+    The class for RNA sequences
+    """
+    rule_complement = 'AUCG'.maketrans('AaUuCcGg', 'UuAaGgCc')


Возьму на заметку, хороший метод!

uzolotikov · 2024-03-09T12:30:02Z

TAOK3.py

+
+        Return: DNAsequence or RNAsequence - the result sequence object
+        """
+        complement = self.seq.translate(type(self).rule_complement)


Хотелось бы еще как то справляться с ошибкой в случае, если complement вызывается на самой NucleicAcidSequence (NotImplementedError или что нибудь такое, указывающее, что пользователю нужно выбрать между DNASequence и RNASequence)

uzolotikov · 2024-03-09T12:32:25Z

TAOK3.py

+        """
+        return set(self.seq.upper()) <= set(self.aa_alphabet)
+
+    def search_for_alt_frames(self, alt_start_aa='M') -> list:


Хорошая функция)

uzolotikov · 2024-03-09T12:41:04Z

TAOK3.py

+    if type(length_bounds) is int:
+        length_bounds = 0, length_bounds
+    if type(gc_bounds) is int or type(gc_bounds) is float:
+        gc_bounds = 0, gc_bounds


Круто, что была сделана проверка на типы!

Важно: Если в gc_bounds пользователь укажет tuple проверка сломается, нужно придумать, что делать в этом случае...
Неважно: тут так и просится isinstance)

Suggested change

if type(length_bounds) is int:

length_bounds = 0, length_bounds

if type(gc_bounds) is int or type(gc_bounds) is float:

gc_bounds = 0, gc_bounds

if isinstance(length_bounds, int):

length_bounds = 0, length_bounds

if isinstance(gc_bounds, int) or isinstance(gc_bounds, float):

gc_bounds = 0, gc_bounds

uzolotikov · 2024-03-09T12:45:08Z

TAOK3.py

+    if len(good_reads) == 0:
+        raise ValueError('There are no sequences suited to requirements')


Кажется, что лучше просто вывести предупреждение в stdout о конкретном файле. Было бы неплохо прогонять фильтр по нескольким файлам в цикле... Но если хоть в одном из них не будет хорошей последовательности, то цикл остановится с ошибкой - неприятно

Suggested change

if len(good_reads) == 0:

raise ValueError('There are no sequences suited to requirements')

if len(good_reads) == 0:

print(f'In {input_path} there is no sequences suited to requirements')

uzolotikov · 2024-03-09T12:45:58Z

TAOK3.py

+    def __init__(self, seq):
+        self.seq = seq


Для стандартизации тут неплохо было бы сделать self.seq = seq.upper() (или .lower()) - не придется приводить к определенному регистру в будущем/сравнивать несколько регистров как на строчке 246

Suggested change

def __init__(self, seq):

self.seq = seq

def __init__(self, seq):

self.seq = seq.upper()

uzolotikov · 2024-03-09T12:46:50Z

TAOK3.py

+        Return: DNAsequence or RNAsequence - the result sequence object
+        """
+        complement = self.seq.translate(type(self).rule_complement)
+        complement_seq = type(self)(complement)


Можно как альтернативу использовать: self.__class__(complement) (две пары кавычек немного смущают, но это дело вкуса)

Suggested change

complement_seq = type(self)(complement)

complement_seq = self.__class__(complement)

PolinaVaganova

У кода хорошая структура, он читаемый. Используются понятные названия функций и переменных. Документация к функциям прекрасно оформлена. Получила удовольствие от проверки.

PolinaVaganova · 2024-03-10T10:34:53Z

TAOK3.py

+    if type(length_bounds) is int:
+        length_bounds = 0, length_bounds
+    if type(gc_bounds) is int or type(gc_bounds) is float:
+        gc_bounds = 0, gc_bounds


Suggested change

if type(length_bounds) is int:

length_bounds = 0, length_bounds

if type(gc_bounds) is int or type(gc_bounds) is float:

gc_bounds = 0, gc_bounds

if isinstance(length_bounds, int):

length_bounds = 0, length_bounds

if isinstance(gc_bounds, (int, float)):

gc_bounds = 0, gc_bounds

Так попроще будет

PolinaVaganova · 2024-03-10T10:37:36Z

TAOK3.py

+            is_above_quality_threshold(seq_record, quality_threshold)):
+            good_reads += [seq_record]
+    if len(good_reads) == 0:
+        raise ValueError('There are no sequences suited to requirements')


Очень здорово, что есть ошибка на такой случай

PolinaVaganova · 2024-03-10T10:38:38Z

TAOK3.py

+        input_filename = os.path.split(input_path)[-1]
+        output_filename = input_filename
+    if not (output_filename.endswith('.fastq')):
+        output_filename = output_filename + '.fastq'


Тоже хорошая проверка

PolinaVaganova · 2024-03-10T10:39:23Z

TAOK3.py

+        os.mkdir(path)
+    output_path = os.path.join(path, output_filename)
+    if os.path.exists(output_path):
+        error = 'File with such name exists! Change output_filename arg!'


Еще одна прекрасная проверка!

PolinaVaganova · 2024-03-10T10:42:48Z

TAOK3.py

+        gc_counter = 0
+        for nucl in self.seq:
+            if nucl in ('G', 'C', 'g', 'c'):
+                gc_counter += 1


Не поспорить)

PolinaVaganova · 2024-03-10T10:45:11Z

TAOK3.py

+    """
+    rule_complement = 'ATCG'.maketrans('AaTtCcGg', 'TtAaGgCc')
+    rule_transcription = 'AUCG'.maketrans('Tt', 'Uu')
+


Вааааау, это очень классно!

Add TAOK3.py

735f21d

nekrasovadasha22 reviewed Mar 6, 2024

View reviewed changes

uzolotikov reviewed Mar 9, 2024

View reviewed changes

PolinaVaganova reviewed Mar 10, 2024

View reviewed changes

		Example: output_filename='result' # 'result.fastq'
		output_filename='result.fastq'

		if len(good_reads) == 0:
		raise ValueError('There are no sequences suited to requirements')

	complement_seq = type(self)(complement)
	complement_seq = self.__class__(complement)

Conversation

nvaulin commented Feb 26, 2024

Uh oh!

nekrasovadasha22 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nekrasovadasha22 Mar 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uzolotikov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PolinaVaganova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

nekrasovadasha22 Mar 9, 2024 •

edited

Loading