Welcome to rendseq’s documentation!

This is an in-development package for the analysis of end-enriched RNA sequencing data.

For more information, see our GitHub page: https://github.com/miraep8/rendseq

File Functions

Functions for fetching, creating, and opening raw and processed data files.

rendseq.file_funcs.make_new_dir(dir_parts)

Create a new directory and return valid path to it.

Parameters:

name (- dir_parts - a list of strings to be joined to make the directory) –

Return type:

  • dir_str - the directory name

rendseq.file_funcs.open_wig(filename)

Open the provided wig file and return the contents into a 2xn array.

Parameters:

required (-filename (string) -) – the filename you desire to open!

Returns:

-reads (2xn array) – and the second column being the count at that position (raw read, z_score etc)

Return type:

a 2xn array with the first column being position

rendseq.file_funcs.write_wig(wig_track, wig_file_name, chrom_name)

Write provided data to the wig file.

Parameters:
  • array) (- wig_track (required) - the wig data you wish to write (in 2xn) –

  • to (- wig_file_name (string) - the new file you will write) –

Functions for Calculating Z-Scores

Functions needed for z-score transforming raw rendSeq data.

rendseq.zscores.main_zscores()

Run Z-score calculations.

Effect: Writes messages to standard out. If –save-file flag, also writes output to disk.

rendseq.zscores.z_scores(reads, gap=5, w_sz=50, percent_trim=0, winsorize=True)

Perform modified z-score transformation of reads.

Parameters:
  • reads (-reads 2xn array - raw rendseq) –

  • (interger) (-gap) – interest that should be excluded in the z_score calculation.

  • (integer) (-w_sz) – one should include in zscore calulcation.

  • before (-percent_trim - what fraction of the top reads should be dropped) – calculating the mean and std? ie 0.1 means the top 10% of reads are dropped.

  • than (-winsorize - bool for whether or not after trimming any reads more) – 1.5 std from the mean should be dropped.

Returns:

-z_scores (2xn array) – and the second column being the z_score.

Return type:

a 2xn array with the first column being position

Functions for calling peaks from Z-Scores

Take normalized raw data find the peaks in it.

rendseq.make_peaks.hmm_peaks(z_scores, i_to_p=0.001, p_to_p=0.6666666666666666, peak_center=10, spread=2)

Fit peaks to the provided z_scores data set using the vertibi algorithm.

Parameters:
  • array) (-z_scores (2xn) – location) second column is a modified z_score for that position.

  • (float) (-spread) – probability of transitioning from inernal state to peak state. The default value is 1/2000, based on asseumption of geometrically distributed transcript lengths with mean length 1000. Should be a robust parameter.

  • (float) – 1/1.5.

  • (float) – for the peak state.

  • (float) – distribution.

Returns:

-peaks – column being a peak assignment.

Return type:

a 2xn array with the first column being position and the second

rendseq.make_peaks.main_make_peaks()

Run the main peak making from command line.

rendseq.make_peaks.parse_args_make_peaks(args)

Parse command line arguments.

rendseq.make_peaks.thresh_peaks(z_scores, thresh=None, method='kink')

Find peaks by calling z-scores above a threshold as a peak.

Parameters:
  • pos. (- z_scores - a 2xn array of nt positions and zscores at that) –

  • be (- thresh - the threshold value to use. If none is provided it will) – automatically calculated.

  • score (- method - the method to use to automatically calculate the z) – if none is provided. Default method is “kink”

Indices and tables