Welcome to rendseq’s documentation!
This is an in-development package for the analysis of end-enriched RNA sequencing data.
For more information, see our GitHub page: https://github.com/miraep8/rendseq
File Functions
Functions for fetching, creating, and opening raw and processed data files.
- rendseq.file_funcs.make_new_dir(dir_parts)
Create a new directory and return valid path to it.
- Parameters:
name (- dir_parts - a list of strings to be joined to make the directory) –
- Return type:
dir_str - the directory name
- rendseq.file_funcs.open_wig(filename)
Open the provided wig file and return the contents into a 2xn array.
- Parameters:
required (-filename (string) -) – the filename you desire to open!
- Returns:
-reads (2xn array) – and the second column being the count at that position (raw read, z_score etc)
- Return type:
a 2xn array with the first column being position
- rendseq.file_funcs.write_wig(wig_track, wig_file_name, chrom_name)
Write provided data to the wig file.
- Parameters:
array) (- wig_track (required) - the wig data you wish to write (in 2xn) –
to (- wig_file_name (string) - the new file you will write) –
Functions for Calculating Z-Scores
Functions needed for z-score transforming raw rendSeq data.
- rendseq.zscores.main_zscores()
Run Z-score calculations.
Effect: Writes messages to standard out. If –save-file flag, also writes output to disk.
- rendseq.zscores.z_scores(reads, gap=5, w_sz=50, percent_trim=0, winsorize=True)
Perform modified z-score transformation of reads.
- Parameters:
reads (-reads 2xn array - raw rendseq) –
(interger) (-gap) – interest that should be excluded in the z_score calculation.
(integer) (-w_sz) – one should include in zscore calulcation.
before (-percent_trim - what fraction of the top reads should be dropped) – calculating the mean and std? ie 0.1 means the top 10% of reads are dropped.
than (-winsorize - bool for whether or not after trimming any reads more) – 1.5 std from the mean should be dropped.
- Returns:
-z_scores (2xn array) – and the second column being the z_score.
- Return type:
a 2xn array with the first column being position
Functions for calling peaks from Z-Scores
Take normalized raw data find the peaks in it.
- rendseq.make_peaks.hmm_peaks(z_scores, i_to_p=0.001, p_to_p=0.6666666666666666, peak_center=10, spread=2)
Fit peaks to the provided z_scores data set using the vertibi algorithm.
- Parameters:
array) (-z_scores (2xn) – location) second column is a modified z_score for that position.
(float) (-spread) – probability of transitioning from inernal state to peak state. The default value is 1/2000, based on asseumption of geometrically distributed transcript lengths with mean length 1000. Should be a robust parameter.
(float) – 1/1.5.
(float) – for the peak state.
(float) – distribution.
- Returns:
-peaks – column being a peak assignment.
- Return type:
a 2xn array with the first column being position and the second
- rendseq.make_peaks.main_make_peaks()
Run the main peak making from command line.
- rendseq.make_peaks.parse_args_make_peaks(args)
Parse command line arguments.
- rendseq.make_peaks.thresh_peaks(z_scores, thresh=None, method='kink')
Find peaks by calling z-scores above a threshold as a peak.
- Parameters:
pos. (- z_scores - a 2xn array of nt positions and zscores at that) –
be (- thresh - the threshold value to use. If none is provided it will) – automatically calculated.
score (- method - the method to use to automatically calculate the z) – if none is provided. Default method is “kink”