Frequencies

This is a simple powershell script that can be used to get the frequency of the first letter from a sample file.

gc ‘./sample’ | %{ $_.substring(0,1) } | group

Running this over say the FTSE 100 symbol list returns:

Count Name                      Group
----- ----                      -----
   10 A                         {A, A, A, A...}
   11 B                         {B, B, B, B...}
    4 C                         {C, C, C, C}
    2 D                         {D, D}
    2 E                         {E, E}
    3 F                         {F, F, F}
    3 G                         {G, G, G}
    5 H                         {H, H, H, H...}
    8 I                         {I, I, I, I...}
    3 J                         {J, J, J}
    1 K                         {K}
    4 L                         {L, L, L, L}
    4 M                         {M, M, M, M}
    3 N                         {N, N, N}
    1 O                         {O}
    6 P                         {P, P, P, P...}
    8 R                         {R, R, R, R...}
   15 S                         {S, S, S, S...}
    2 T                         {T, T}
    2 U                         {U, U}
    1 V                         {V}
    2 W                         {W, W}

This highlights that the symbols are not uniformly spread across the alphabet.

A-F has 1/3 of the market as does P-Z

I found out this once when trying to use the ticker symbol to load balance market data across 3 servers.

Check the distribution of the data before you use a simple key.

Oddly the second letter is a better key:

    6 A                         {A, A, A, A...}
    3 B                         {B, B, B}
    4 C                         {C, C, C, C}
    5 D                         {D, D, D, D...}
    3 E                         {E, E, E}
    6 G                         {G, G, G, G...}
    4 H                         {H, H, H, H}
    3 I                         {I, I, I}
    2 K                         {K, K}
    8 L                         {L, L, L, L...}
    6 M                         {M, M, M, M...}
    7 N                         {N, N, N, N...}
    2 O                         {O, O}
    4 P                         {P, P, P, P}
    8 R                         {R, R, R, R...}
    9 S                         {S, S, S, S...}
    7 T                         {T, T, T, T...}
    2 U                         {U, U}
    6 V                         {V, V, V, V...}
    2 W                         {W, W}
    2 X                         {X, X}
    1 Z                         {Z}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s