10渐渗片段、标记开发
一.渐渗片段
0.替换染色体名
sh /share/nas1/yuj/script/ddradANDreseqANDgenome/vcf_tools/remap_chr.sh 旧vcf id映射 新vcf
1.拆分并去除杂合和未分型位点
python3 split_vcf_by_samples_pair_with_simian_filter_homozygous_only.py -i all.rename.daerwen.chr-rename.sample-rename.vcf -s id.txt -o output_dir --simian-id Simian3
2.对每个子vcf进行渐渗查找
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/method2/vcf_analyzer_v1.1.py -i all.daerwen.chr-rename.sample-rename.vcf -o v1.1 &
1.查找与参考相同的渐渗片段
# python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/method2/vcf_analyzer_v1.1.py -i vcf文件 -o 输出目录
# 默认泗棉三号和参考为父母本,策略2
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/method2/vcf_analyzer_v1.1.py -i all.daerwen.chr-rename.sample-rename.vcf -o v1.1 &
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/method2/vcf_analyzer_v1.2.2.py -i all.daerwen.chr-rename.sample-rename.vcf -o v1.2.2-500k &
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/method2/vcf_analyzer_v1.2.2.py -i all.daerwen.chr-rename.sample-rename.vcf -o v1.2.2-5M/ --min-length 5M &
## /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/vcf_reference_identity_analyzer.py 策略1,与参考比较
## /share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/vcf_sample_comparison_analyzer.py 策略1,当前样品与目标样品比较
2.绘制参考来的渐渗片段在染色体上分布
/share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/fig1_visualize_introgressed_segments.py
3.可不看
1)合并同一样品不同来源渐渗片段
/share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/merge_genomic_intervals.py
2)绘制同一样品的成分来源
/share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/fig2_chromosome_region_plotter.py
4.查找不同样品的共有渐渗区段
/share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/find_common_intervals.py # 首尾一模一样
/share/nas1/yuj/script/ddradANDreseqANDgenome/introgressed_fragment/find_overlapping_intervals.py # 允许重叠
二.密集区域
mamba activate pylatest
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/vcf_tools/snp_density_plot_and_variation_region_out.py -v group2.tm-1.onlychr.onlydiff.snp.recode.vcf -c /share/nas2/yuj/project/2025/re-sequencing/GP-20241021-9521-2_20250104/001reseq/01tm-1/data/chr.list -w 1M
for i in 50k 100k 200k 500k 1m 2m 5m 10m;do time python3 auto-snp_density_plot_and_variation_region_out.py -v group2.tm-1.onlychr.onlydiff.snp.recode.vcf -c /share/nas2/yuj/project/2025/re-sequencing/GP-20241021-9521-2_20250104/001reseq/01tm-1/data/chr.list -o $i.auto.png -w $i -s $i.all_snp_density.txt > $i.log & done
选择一个窗口对应的密集区域
三.标记开发
3.1 kasp标记
0.准备数据
指定样品们
去除非染色体
grep -v -i "Scaffold" group2.tm-1.sample-rename.vcf.recode.vcf > group2.tm-1.onlychr.snp.recode.vcf
去除无差异行
time bcftools view -i 'GT!="0/0" && GT!="0|0"' group2.tm-1.onlychr.snp.recode.vcf > group2.tm-1.onlychr.onlydiff.snp.recode.vcf
接正常kasp流程来跑
3.1.1 渐渗区段筛选
06KASP流程-通用-有参重测序&简化 or 无参简化
见2.6章节
3.1.2 密集区域筛选
1.引物
cd kasp_analysis/primer_design && cut -f 1,2,3 /share/nas2/yuj/project/2025/re-sequencing/GP-20250509-10885_20250512/02tm-1/2m-region.txt > 2m-region.pos && for i in *anno*xls;do python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/vcf_tools/vcf_like_extractor.py -i $i -p 2m-region.pos -o filter.$i & done
2.基因型数据
cd kasp_analysis/selectVariant
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/vcf_tools/vcf_like_extractor.py -i samples.kasp.genotype.select.xls -p /share/nas2/yuj/project/2025/re-sequencing/GP-20250509-10885_20250512/02tm-1/2m-region.txt -o filter.samples.kasp.genotype.select.xls
3.2 indel标记
路径:
/share/nas2/zhouxy/pipline2/indel-primer-pip/v1.3/indel-primer-pip.pl
/share/nas1/yuj/pipline2/indel-primer-pip/v1.3/ # 更新primer3软件版本
0.准备数据
去除无差异行、去除非染色体的indel vcf文件
grep -v "#" group2.tm-1.onlychr.onlydiff.indel.recode.vcf | cut -f 1,2 > indel.pos.txt
添加#CHR POS
indel.pos格式如下:
#CHR POS
A02 39305692
A02 39319333
A02 39319502
A02 39319523
A02 39319664
1.生成命令
perl /share/nas1/yuj/pipline2/indel-primer-pip/v1.3/indel-primer-pip.pl -i indel.pos.txt -g genome.fna -o primer_design
2.运行命令
sh primer_design/commands.sh
3.筛选指定区间
cut -f 1,2,3 /share/nas2/yuj/project/2025/re-sequencing/GP-20250509-10885_20250512/02tm-1/2m-region.txt > 2m-region.pos
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/vcf_tools/vcf_like_extractor.py -i primer-design.result.info.xls -p 2m-region.pos -o filter.primer-design.result.info.xls
3.3 ssr标记
1.选择区域
2.提取序列
bedtools getfasta -fi TM-1_V2.1.fa -bed 2m-region.txt -fo output.fna
# A01:102000000-104000000表示取出了102000000-103999999
3.寻找ssr
perl misa.pl output.fna
4.生成所有位点的引物
(1)生成cpSSR.ssr.p3in
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/ssr_marker/ssr_genome2p3in.py -g output.fna -s output.fna.misa -o output.p3in
(2)设计引物
perl /share/nas1/yuj/script/ddradANDreseqANDgenome/ssr_marker/ssr_primer_designer.pl -i output.p3in -t 63 -o ssr_primer_designer
# 文件会很多,程序合并会失败
find ssr_primer_designer/tmp_p3out/ -type f -name "*" -print0 | xargs -0 cat > output.p3out
5.筛选5对引物设计成功的标记
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/ssr_marker/filter_from_all_primer.py -i output.fna.misa -d ssr_primer_designer/tmp_p3out/ -o filter_markers.txt
6.输出最终结果
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/ssr_marker/parse_p3_out2xls.py -i filter_markers.txt -p output.p3out -o ssr_marker.primer.xls
7.转换为绝对位置
python3 /share/nas1/yuj/script/ddradANDreseqANDgenome/ssr_marker/convert_ssr_positions.py -i ssr_marker.primer.xls