Review DIABLO by nvaulin · Pull Request #19 · Python-BI-2023/Peer_review

nvaulin · 2024-02-26T17:54:10Z

Review DIABLO

alibibio

Сделано неплохо, но в целом впечатление, что код усложнен излишними действиями.
Вместо постоянных проверок типов DNA RNA лучше использовать полиморфизм

alibibio · 2024-03-03T14:48:52Z

DIABLO.py

+def filter_fastq(input_path: str, gc_bounds: Tuple[int, int] = (0, 100),
+                 length_bounds: Tuple[int, int] = (0, 2 ** 32), quality_threshold: int = 0,
+                 output_filename: str = None) -> NoReturn:


Почему не использовать обычный вариант: gc_bound: tuple ?

Так мы аннотируем не только кортеж (что дает лишь органиченную инфу), а и содержимое кортежа (говорим что это именно целые числа). Это хорошая практика

alibibio · 2024-03-03T14:50:30Z

DIABLO.py

+    if not os.path.isdir("fastq_filtrator_resuls"):
+        os.mkdir("fastq_filtrator_resuls")


Мне нравится, что предусмотрена директория для результатов

alibibio · 2024-03-03T14:54:48Z

DIABLO.py

+    with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='w'):
+        pass


Странный мув для создания файла. Можно открыть его в конце и сразу записать результаты

Согласен

alibibio · 2024-03-03T15:00:01Z

DIABLO.py

+        max_gc = gc_bounds
+
+    for record in SeqIO.parse(open(input_path), "fastq"):
+        name, seq, description, quality = record.id, record.seq, record.description, record.letter_annotations[


Излишние переменные. name и description не используется, остальные по одному разу.

alibibio · 2024-03-03T15:01:12Z

DIABLO.py

+            with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='a') as new_file:
+                new_file.write(f'{record.format("fastq")} \n')


Лучше открыть файл перед циклом, чем открывать на каждой иттерации

alibibio · 2024-03-03T15:05:26Z

DIABLO.py

+                new_file.write(f'{record.format("fastq")} \n')
+
+
+class BiologicalSequence(str):


не реализованы методы __len__, __get_item__

alibibio · 2024-03-03T15:08:52Z

DIABLO.py

+        for base in self.seq:
+            res.append(self.complement_pairs[base] if base.islower() else self.complement_pairs[base.lower()].upper())


можно сразу делать upper

alibibio · 2024-03-03T15:12:23Z

DIABLO.py

+        cnt = Counter(self.seq.upper())
+        return round((cnt['C'] + cnt['G']) / len(self.seq), 4)


лучше переписать с использование метода count

greenbergM

Много интересных решений в коде, которые я раньше особо не встречал. Есть отдельные недопонимания (про отсутствие методов len и getitem уже написали до меня), но работа хорошая :)

greenbergM · 2024-03-10T01:44:40Z

DIABLO.py

+AA_SET = set('FLIMVSPTAYHQNKDECWRG')
+DNA_NUCLEOTIDES = set('ATGC')
+RNA_NUCLEOTIDES = set('AUGC')
+PAIRS_DNA = {'a': 't', 't': 'a', 'c': 'g', 'g': 'c'}


Мне кажется было бы лаконичнее сделать словари с комплементарными парами с заглавными буквами, раз уж нуклеотиды и аминокислоты тоже написаны капсом

greenbergM · 2024-03-10T01:54:34Z

DIABLO.py

+        max_length = length_bounds
+
+    try:
+        min_gc, max_gc = gc_bounds


Не сразу понял даже, как это сделано... Красиво на самом деле, классная задумка)

Задумка интересная, но все таки лучше не злоупотреблять try-except. Все таки если мы хотим какие-то случаи рассмотреть - то это if else. try-except он для ошибок, и не очень хорошо делать конструкции типа "если А, то будет ошибка, а если ошибка то B". Лучше сразу "если А то B"

greenbergM · 2024-03-10T01:55:43Z

DIABLO.py

+    """
+    if not os.path.isdir("fastq_filtrator_resuls"):
+        os.mkdir("fastq_filtrator_resuls")
+    with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='w'):


опечатка?

Если видете опечатку, то наверное лучше в PR сразу ее и подправить, чтобы другие не искали

greenbergM · 2024-03-10T02:03:47Z

DIABLO.py

+                min_gc <= gc <= max_gc and \
+                q_seq >= quality_threshold:
+            with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='a') as new_file:
+                new_file.write(f'{record.format("fastq")} \n')


Интересное решение с многократным открытием файла на дозапись, но мне кажется, что один раз открыть на запись было бы проще и читабельнее

greenbergM · 2024-03-10T02:05:22Z

DIABLO.py

+
+    def __repr__(self):
+        return f'Sequence: {self.seq}'
+


Я вот репр не сделал, а надо было бы...

greenbergM · 2024-03-10T02:10:51Z

DIABLO.py

+            float: The GC content of the sequence.
+        """
+        cnt = Counter(self.seq.upper())
+        return round((cnt['C'] + cnt['G']) / len(self.seq), 4)


Можно было использовать встроенную функцию в Biopython

greenbergM · 2024-03-10T02:14:46Z

DIABLO.py

+        seq (str): The biological sequence.
+        mol_type (str): The type of molecule in the sequence (DNA, RNA, AA_seq).
+    """
+


Не очень понял необходимости этого класса, если у тебя в BiologicalSequence каких-то других вариантов для последовательности все равно нет. Да, BiologicalSequence бы тогда стал более громоздким, но мне кажется этот функционал присваивания типа молекулы было бы логично поместить туда.

greenbergM · 2024-03-10T02:16:21Z

DIABLO.py

+        """
+        res = []
+        for base in self.seq:
+            res.append(self.complement_pairs[base] if base.islower() else self.complement_pairs[base.lower()].upper())


Вот тут, если бы словари в начале были большими буквами, можно было бы подсократить.

greenbergM · 2024-03-10T02:20:46Z

DIABLO.py

+                summ_charge.extend([PK3[key] for _ in range(value)])
+            except KeyError:
+                pass
+


Все гуд!

Add DIABLO.py

ec95add

alibibio reviewed Mar 3, 2024

View reviewed changes

greenbergM reviewed Mar 10, 2024

View reviewed changes

		if not os.path.isdir("fastq_filtrator_resuls"):
		os.mkdir("fastq_filtrator_resuls")

		with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='w'):
		pass

		with open(f'fastq_filtrator_resuls/{output_filename}.fastq', mode='a') as new_file:
		new_file.write(f'{record.format("fastq")} \n')

		new_file.write(f'{record.format("fastq")} \n')


		class BiologicalSequence(str):

		for base in self.seq:
		res.append(self.complement_pairs[base] if base.islower() else self.complement_pairs[base.lower()].upper())

		cnt = Counter(self.seq.upper())
		return round((cnt['C'] + cnt['G']) / len(self.seq), 4)

Conversation

nvaulin commented Feb 26, 2024

Uh oh!

alibibio left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greenbergM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alibibio left a comment •

edited

Loading