Probability of a 3 digits occurence from a sequence of 64 random hexadecimal symbols? The Next CEO of Stack OverflowTold by Professor that this is PIE, but don't see how it's PIE. Help understanding what constitutes the sets, or alternative ways to solve?Probability of finding 2012 before any other occurence of 012 in a random infinite sequence of digits 0,1,2Probability of random integer's digits summing to 12Roulette Probability - Occurence of the Same ValueProbability of co-occurenceProbability that among 3 random digits two different oneOccurence and Position of digits in a combination lockProbability of occurence of alphabetsProbability of Occurence of HEART or EARTHProbability of Next OccurenceProbability of having a first occurence in Poisson random distribution
Which one is the true statement?
Should I tutor a student who I know has cheated on their homework?
The past simple of "gaslight" – "gaslighted" or "gaslit"?
Rotate a column
What was the first Unix version to run on a microcomputer?
How to prove a simple equation?
Would a completely good Muggle be able to use a wand?
Can you be charged for obstruction for refusing to answer questions?
What did we know about the Kessel run before the prequels?
Reference request: Grassmannian and Plucker coordinates in type B, C, D
Is it possible to replace duplicates of a character with one character using tr
How to avoid supervisors with prejudiced views?
How I can get glyphs from a fraktur font and use them as identifiers?
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
Why did CATV standarize in 75 ohms and everyone else in 50?
What does "Its cash flow is deeply negative" mean?
How to install OpenCV on Raspbian Stretch?
Why didn't Khan get resurrected in the Genesis Explosion?
TikZ: How to reverse arrow direction without switching start/end point?
Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?
Flying from Cape Town to England and return to another province
A Man With a Stainless Steel Endoskeleton (like The Terminator) Fighting Cloaked Aliens Only He Can See
How to edit “Name” property in GCI output?
How to delete every two lines after 3rd lines in a file contains very large number of lines?
Probability of a 3 digits occurence from a sequence of 64 random hexadecimal symbols?
The Next CEO of Stack OverflowTold by Professor that this is PIE, but don't see how it's PIE. Help understanding what constitutes the sets, or alternative ways to solve?Probability of finding 2012 before any other occurence of 012 in a random infinite sequence of digits 0,1,2Probability of random integer's digits summing to 12Roulette Probability - Occurence of the Same ValueProbability of co-occurenceProbability that among 3 random digits two different oneOccurence and Position of digits in a combination lockProbability of occurence of alphabetsProbability of Occurence of HEART or EARTHProbability of Next OccurenceProbability of having a first occurence in Poisson random distribution
$begingroup$
I have a software test failing from time to time, due to the fact that the string 123
appears when generating a string with SecureRandom.hex(32)
(Ruby), which gives a 64 symbols string.
This includes (A-F) and (0-9), so 16 symbols possible.
I have played a little, and I noticed that it can roughly happens 1500 times over 100,000 calls to the function, so around 1.5%.
What would be the formula to have the exact probability of this test failing?
As a developer, I had to share a little bit of code: JS Fiddle.
Edit: Answer's WolframAlpha link
54392126209570846953192790832763093471349103670511891931049735583247364107/
3618502788666131106986593281521497120414687020801267626233049500247285301248
≈0.0150317
≈1.50317%
probability combinatorics
$endgroup$
add a comment |
$begingroup$
I have a software test failing from time to time, due to the fact that the string 123
appears when generating a string with SecureRandom.hex(32)
(Ruby), which gives a 64 symbols string.
This includes (A-F) and (0-9), so 16 symbols possible.
I have played a little, and I noticed that it can roughly happens 1500 times over 100,000 calls to the function, so around 1.5%.
What would be the formula to have the exact probability of this test failing?
As a developer, I had to share a little bit of code: JS Fiddle.
Edit: Answer's WolframAlpha link
54392126209570846953192790832763093471349103670511891931049735583247364107/
3618502788666131106986593281521497120414687020801267626233049500247285301248
≈0.0150317
≈1.50317%
probability combinatorics
$endgroup$
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
1
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55
add a comment |
$begingroup$
I have a software test failing from time to time, due to the fact that the string 123
appears when generating a string with SecureRandom.hex(32)
(Ruby), which gives a 64 symbols string.
This includes (A-F) and (0-9), so 16 symbols possible.
I have played a little, and I noticed that it can roughly happens 1500 times over 100,000 calls to the function, so around 1.5%.
What would be the formula to have the exact probability of this test failing?
As a developer, I had to share a little bit of code: JS Fiddle.
Edit: Answer's WolframAlpha link
54392126209570846953192790832763093471349103670511891931049735583247364107/
3618502788666131106986593281521497120414687020801267626233049500247285301248
≈0.0150317
≈1.50317%
probability combinatorics
$endgroup$
I have a software test failing from time to time, due to the fact that the string 123
appears when generating a string with SecureRandom.hex(32)
(Ruby), which gives a 64 symbols string.
This includes (A-F) and (0-9), so 16 symbols possible.
I have played a little, and I noticed that it can roughly happens 1500 times over 100,000 calls to the function, so around 1.5%.
What would be the formula to have the exact probability of this test failing?
As a developer, I had to share a little bit of code: JS Fiddle.
Edit: Answer's WolframAlpha link
54392126209570846953192790832763093471349103670511891931049735583247364107/
3618502788666131106986593281521497120414687020801267626233049500247285301248
≈0.0150317
≈1.50317%
probability combinatorics
probability combinatorics
edited Apr 20 '18 at 15:12
Cyril Gandon
asked Apr 20 '18 at 13:30
Cyril GandonCyril Gandon
1136
1136
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
1
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55
add a comment |
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
1
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
1
1
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
An exact answer, via Inclusion Exclusion Principle:
Let $A_i =$ the set of 64-long strings where "123" appears starting at the $i$th position ($1 le i le 62$). E.g. if "123" appears at both the 2nd and 40th positions in string $x$, then $x in A_2$ and $x in A_40$, i.e. $x in A_2 cap A_40 $.
Then $S = bigcup_i A_i=$ the set of strings where "123" appears at least once, and your desired probability is $|S|/16^64$.
The Inclusion Exclusion Principle gives:
$$|S| = |bigcup A_i| = sum_i |A_i| - sum_i<j |A_i cap A_j| + sum_i<j<k |A_i cap A_j cap A_k| - dots $$
Obviously $|A_i| = 16^61$ as the other 61 positions can be anything. So the first term is $sum_i |A_i| = 62 cdot 16^61$.
Now two occurrences of "123" cannot overlap (unlike e.g. if you're looking for "111" in which case they CAN overlap). So if $j - i < 3$, then $|A_i cap A_j| = 0$. For the non-trivial case of $j - i ge 3$, the other $64 - 3 - 3 = 58$ positions can be filled with anything, so $|A_i cap A_j| = 16^58$.
Meanwhile, how many $(i, j)$ pairs are there s.t. $j - i ge 3$? Imagine "merging" each "123" into a new character "!", so the final string length is now $64-2-2 = 60$ instead and there are two "!" characters. The number of ways to do this is simply $60 choose 2$. Thus the 2nd term is:
$$ sum_i<j |A_i cap A_j| = 60 choose 2 16^58$$
The $m$th term is similar and we have:
$$ sum_i_1 < i_2 < cdots < i_m |A_i_1 cap A_i_2 cap cdots cap A_i_m| = 64 - 2m choose m 16^64 - 3m $$
So the final result is:
$$ |S| = sum_m=1^21 (-1)^m+1 64 - 2m choose m 16^64 - 3m $$
$endgroup$
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
add a comment |
$begingroup$
This answer is based upon the Goulden-Jackson Cluster Method.
We consider the set words of length $ngeq 0$ built from the alphabet $mathcalV=0,ldots,9,A,ldots,F$ and the set $B=123$ of bad words, which are not allowed to be part of the words we are looking for.
We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of these words of length $n$.
Since we are looking for the number of words which contain the bad word $123$, the resulting generating function is the generating function of all words minus $f(s)$
beginalign*
&1+16s+16^2s^2+16^3s^3cdots-f(s)=frac11-16s-f(s)
endalign*
According to the paper (p.7) the generating function $f(s)$ is
beginalign*
f(s)=frac11-ds-textweight(mathcalC)tag1
endalign*
with $d=|mathcalV|=16$, the size of the alphabet and $mathcalC$ the weight-numerator of bad words with
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])
endalign*
We calculate according to the paper
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])&=-s^3tag2\
endalign*
It follows from (1) and (2)
beginalign*
f(s)=frac11-16s+s^3\
endalign*
and the generated function counting all strings which contain $123$ is
beginalign*
&colorbluefrac11-16s-frac11-16s+s^3=s^3 + colorblue32 s^4 + 768 s^5 + 16,383 s^6 +cdotstag3\
endalign*
The coefficients of the series were calculated with the help of Wolfram Alpha. We see for instance there are $colorblue32$ words of length $4$ which contain the bad word $123$. These are $$(0..9,A..F)123qquadtext and qquad 123(0..9,A..F).$$
The coefficient of $s^n$
In fact we are interested in the coefficient of $s^64$. We calculate the coefficient of $s^n$ from (3). It is convenient to use the coefficient of operator $[s^n]$ to denote the coefficient of $s^n$ of a series.
We obtain from (3) for $ngeq 0$
beginalign*
colorblue[s^n]&colorblueleft(frac11-16s-frac11-16s+s^3right)\
&=[s^n]sum_m=0^infty 16^ms^m-[s^n]sum_m=0^infty s^m(16-s^2)^m\
&=16^n-[s^n]sum_m=0^infty s^msum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^n-m]sum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^m]sum_k=0^n-mbinomn-mk(-1)^ks^2k16^n-m-k\
&=16^n-sum_m=0^lfloor n/2 rfloor [s^2m]sum_k=0^n-2mbinomn-2mk(-1)^ks^2k16^n-2m-k\
&=16^n-sum_m=0^lfloor n/3 rfloorbinomn-2mm(-1)^m16^n-3m\
&,,colorblue=sum_m=1^lfloor n/3 rfloorbinomn-2mm(-1)^m+116^n-3mtag4\
endalign*
Finally we obtain from (4) in accordance with the answer of @antkam the wanted probability
beginalign*
colorblue[s^64]colorblueleft(frac11-16s-frac11-16s+s^3right)16^-64&colorblue=sum_m=1^21binom64-2mm(-1)^m+116^-3m\
&colorbluesimeq 0.01503,16662,40506
endalign*
which gives roughly $colorblue1.5%$.
$endgroup$
add a comment |
$begingroup$
Just a quick hint: total number of all possible strings is $16^32$, the number of strings that contain at least one occurrence of "$123$" is $32 cdot 16^29=2cdot 16^30$ ($32$ spots to put substring "$123$" and we need to fill remaining $29$ positions). The probability for truly random generation should be $$frac2cdot 16^3016^32=frac1128$$
$endgroup$
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2746006%2fprobability-of-a-3-digits-occurence-from-a-sequence-of-64-random-hexadecimal-sym%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
An exact answer, via Inclusion Exclusion Principle:
Let $A_i =$ the set of 64-long strings where "123" appears starting at the $i$th position ($1 le i le 62$). E.g. if "123" appears at both the 2nd and 40th positions in string $x$, then $x in A_2$ and $x in A_40$, i.e. $x in A_2 cap A_40 $.
Then $S = bigcup_i A_i=$ the set of strings where "123" appears at least once, and your desired probability is $|S|/16^64$.
The Inclusion Exclusion Principle gives:
$$|S| = |bigcup A_i| = sum_i |A_i| - sum_i<j |A_i cap A_j| + sum_i<j<k |A_i cap A_j cap A_k| - dots $$
Obviously $|A_i| = 16^61$ as the other 61 positions can be anything. So the first term is $sum_i |A_i| = 62 cdot 16^61$.
Now two occurrences of "123" cannot overlap (unlike e.g. if you're looking for "111" in which case they CAN overlap). So if $j - i < 3$, then $|A_i cap A_j| = 0$. For the non-trivial case of $j - i ge 3$, the other $64 - 3 - 3 = 58$ positions can be filled with anything, so $|A_i cap A_j| = 16^58$.
Meanwhile, how many $(i, j)$ pairs are there s.t. $j - i ge 3$? Imagine "merging" each "123" into a new character "!", so the final string length is now $64-2-2 = 60$ instead and there are two "!" characters. The number of ways to do this is simply $60 choose 2$. Thus the 2nd term is:
$$ sum_i<j |A_i cap A_j| = 60 choose 2 16^58$$
The $m$th term is similar and we have:
$$ sum_i_1 < i_2 < cdots < i_m |A_i_1 cap A_i_2 cap cdots cap A_i_m| = 64 - 2m choose m 16^64 - 3m $$
So the final result is:
$$ |S| = sum_m=1^21 (-1)^m+1 64 - 2m choose m 16^64 - 3m $$
$endgroup$
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
add a comment |
$begingroup$
An exact answer, via Inclusion Exclusion Principle:
Let $A_i =$ the set of 64-long strings where "123" appears starting at the $i$th position ($1 le i le 62$). E.g. if "123" appears at both the 2nd and 40th positions in string $x$, then $x in A_2$ and $x in A_40$, i.e. $x in A_2 cap A_40 $.
Then $S = bigcup_i A_i=$ the set of strings where "123" appears at least once, and your desired probability is $|S|/16^64$.
The Inclusion Exclusion Principle gives:
$$|S| = |bigcup A_i| = sum_i |A_i| - sum_i<j |A_i cap A_j| + sum_i<j<k |A_i cap A_j cap A_k| - dots $$
Obviously $|A_i| = 16^61$ as the other 61 positions can be anything. So the first term is $sum_i |A_i| = 62 cdot 16^61$.
Now two occurrences of "123" cannot overlap (unlike e.g. if you're looking for "111" in which case they CAN overlap). So if $j - i < 3$, then $|A_i cap A_j| = 0$. For the non-trivial case of $j - i ge 3$, the other $64 - 3 - 3 = 58$ positions can be filled with anything, so $|A_i cap A_j| = 16^58$.
Meanwhile, how many $(i, j)$ pairs are there s.t. $j - i ge 3$? Imagine "merging" each "123" into a new character "!", so the final string length is now $64-2-2 = 60$ instead and there are two "!" characters. The number of ways to do this is simply $60 choose 2$. Thus the 2nd term is:
$$ sum_i<j |A_i cap A_j| = 60 choose 2 16^58$$
The $m$th term is similar and we have:
$$ sum_i_1 < i_2 < cdots < i_m |A_i_1 cap A_i_2 cap cdots cap A_i_m| = 64 - 2m choose m 16^64 - 3m $$
So the final result is:
$$ |S| = sum_m=1^21 (-1)^m+1 64 - 2m choose m 16^64 - 3m $$
$endgroup$
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
add a comment |
$begingroup$
An exact answer, via Inclusion Exclusion Principle:
Let $A_i =$ the set of 64-long strings where "123" appears starting at the $i$th position ($1 le i le 62$). E.g. if "123" appears at both the 2nd and 40th positions in string $x$, then $x in A_2$ and $x in A_40$, i.e. $x in A_2 cap A_40 $.
Then $S = bigcup_i A_i=$ the set of strings where "123" appears at least once, and your desired probability is $|S|/16^64$.
The Inclusion Exclusion Principle gives:
$$|S| = |bigcup A_i| = sum_i |A_i| - sum_i<j |A_i cap A_j| + sum_i<j<k |A_i cap A_j cap A_k| - dots $$
Obviously $|A_i| = 16^61$ as the other 61 positions can be anything. So the first term is $sum_i |A_i| = 62 cdot 16^61$.
Now two occurrences of "123" cannot overlap (unlike e.g. if you're looking for "111" in which case they CAN overlap). So if $j - i < 3$, then $|A_i cap A_j| = 0$. For the non-trivial case of $j - i ge 3$, the other $64 - 3 - 3 = 58$ positions can be filled with anything, so $|A_i cap A_j| = 16^58$.
Meanwhile, how many $(i, j)$ pairs are there s.t. $j - i ge 3$? Imagine "merging" each "123" into a new character "!", so the final string length is now $64-2-2 = 60$ instead and there are two "!" characters. The number of ways to do this is simply $60 choose 2$. Thus the 2nd term is:
$$ sum_i<j |A_i cap A_j| = 60 choose 2 16^58$$
The $m$th term is similar and we have:
$$ sum_i_1 < i_2 < cdots < i_m |A_i_1 cap A_i_2 cap cdots cap A_i_m| = 64 - 2m choose m 16^64 - 3m $$
So the final result is:
$$ |S| = sum_m=1^21 (-1)^m+1 64 - 2m choose m 16^64 - 3m $$
$endgroup$
An exact answer, via Inclusion Exclusion Principle:
Let $A_i =$ the set of 64-long strings where "123" appears starting at the $i$th position ($1 le i le 62$). E.g. if "123" appears at both the 2nd and 40th positions in string $x$, then $x in A_2$ and $x in A_40$, i.e. $x in A_2 cap A_40 $.
Then $S = bigcup_i A_i=$ the set of strings where "123" appears at least once, and your desired probability is $|S|/16^64$.
The Inclusion Exclusion Principle gives:
$$|S| = |bigcup A_i| = sum_i |A_i| - sum_i<j |A_i cap A_j| + sum_i<j<k |A_i cap A_j cap A_k| - dots $$
Obviously $|A_i| = 16^61$ as the other 61 positions can be anything. So the first term is $sum_i |A_i| = 62 cdot 16^61$.
Now two occurrences of "123" cannot overlap (unlike e.g. if you're looking for "111" in which case they CAN overlap). So if $j - i < 3$, then $|A_i cap A_j| = 0$. For the non-trivial case of $j - i ge 3$, the other $64 - 3 - 3 = 58$ positions can be filled with anything, so $|A_i cap A_j| = 16^58$.
Meanwhile, how many $(i, j)$ pairs are there s.t. $j - i ge 3$? Imagine "merging" each "123" into a new character "!", so the final string length is now $64-2-2 = 60$ instead and there are two "!" characters. The number of ways to do this is simply $60 choose 2$. Thus the 2nd term is:
$$ sum_i<j |A_i cap A_j| = 60 choose 2 16^58$$
The $m$th term is similar and we have:
$$ sum_i_1 < i_2 < cdots < i_m |A_i_1 cap A_i_2 cap cdots cap A_i_m| = 64 - 2m choose m 16^64 - 3m $$
So the final result is:
$$ |S| = sum_m=1^21 (-1)^m+1 64 - 2m choose m 16^64 - 3m $$
answered Apr 20 '18 at 14:58
antkamantkam
2,587312
2,587312
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
add a comment |
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
$begingroup$
Awesome, I would have never been able to compute that!
$endgroup$
– Cyril Gandon
Apr 20 '18 at 15:14
add a comment |
$begingroup$
This answer is based upon the Goulden-Jackson Cluster Method.
We consider the set words of length $ngeq 0$ built from the alphabet $mathcalV=0,ldots,9,A,ldots,F$ and the set $B=123$ of bad words, which are not allowed to be part of the words we are looking for.
We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of these words of length $n$.
Since we are looking for the number of words which contain the bad word $123$, the resulting generating function is the generating function of all words minus $f(s)$
beginalign*
&1+16s+16^2s^2+16^3s^3cdots-f(s)=frac11-16s-f(s)
endalign*
According to the paper (p.7) the generating function $f(s)$ is
beginalign*
f(s)=frac11-ds-textweight(mathcalC)tag1
endalign*
with $d=|mathcalV|=16$, the size of the alphabet and $mathcalC$ the weight-numerator of bad words with
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])
endalign*
We calculate according to the paper
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])&=-s^3tag2\
endalign*
It follows from (1) and (2)
beginalign*
f(s)=frac11-16s+s^3\
endalign*
and the generated function counting all strings which contain $123$ is
beginalign*
&colorbluefrac11-16s-frac11-16s+s^3=s^3 + colorblue32 s^4 + 768 s^5 + 16,383 s^6 +cdotstag3\
endalign*
The coefficients of the series were calculated with the help of Wolfram Alpha. We see for instance there are $colorblue32$ words of length $4$ which contain the bad word $123$. These are $$(0..9,A..F)123qquadtext and qquad 123(0..9,A..F).$$
The coefficient of $s^n$
In fact we are interested in the coefficient of $s^64$. We calculate the coefficient of $s^n$ from (3). It is convenient to use the coefficient of operator $[s^n]$ to denote the coefficient of $s^n$ of a series.
We obtain from (3) for $ngeq 0$
beginalign*
colorblue[s^n]&colorblueleft(frac11-16s-frac11-16s+s^3right)\
&=[s^n]sum_m=0^infty 16^ms^m-[s^n]sum_m=0^infty s^m(16-s^2)^m\
&=16^n-[s^n]sum_m=0^infty s^msum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^n-m]sum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^m]sum_k=0^n-mbinomn-mk(-1)^ks^2k16^n-m-k\
&=16^n-sum_m=0^lfloor n/2 rfloor [s^2m]sum_k=0^n-2mbinomn-2mk(-1)^ks^2k16^n-2m-k\
&=16^n-sum_m=0^lfloor n/3 rfloorbinomn-2mm(-1)^m16^n-3m\
&,,colorblue=sum_m=1^lfloor n/3 rfloorbinomn-2mm(-1)^m+116^n-3mtag4\
endalign*
Finally we obtain from (4) in accordance with the answer of @antkam the wanted probability
beginalign*
colorblue[s^64]colorblueleft(frac11-16s-frac11-16s+s^3right)16^-64&colorblue=sum_m=1^21binom64-2mm(-1)^m+116^-3m\
&colorbluesimeq 0.01503,16662,40506
endalign*
which gives roughly $colorblue1.5%$.
$endgroup$
add a comment |
$begingroup$
This answer is based upon the Goulden-Jackson Cluster Method.
We consider the set words of length $ngeq 0$ built from the alphabet $mathcalV=0,ldots,9,A,ldots,F$ and the set $B=123$ of bad words, which are not allowed to be part of the words we are looking for.
We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of these words of length $n$.
Since we are looking for the number of words which contain the bad word $123$, the resulting generating function is the generating function of all words minus $f(s)$
beginalign*
&1+16s+16^2s^2+16^3s^3cdots-f(s)=frac11-16s-f(s)
endalign*
According to the paper (p.7) the generating function $f(s)$ is
beginalign*
f(s)=frac11-ds-textweight(mathcalC)tag1
endalign*
with $d=|mathcalV|=16$, the size of the alphabet and $mathcalC$ the weight-numerator of bad words with
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])
endalign*
We calculate according to the paper
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])&=-s^3tag2\
endalign*
It follows from (1) and (2)
beginalign*
f(s)=frac11-16s+s^3\
endalign*
and the generated function counting all strings which contain $123$ is
beginalign*
&colorbluefrac11-16s-frac11-16s+s^3=s^3 + colorblue32 s^4 + 768 s^5 + 16,383 s^6 +cdotstag3\
endalign*
The coefficients of the series were calculated with the help of Wolfram Alpha. We see for instance there are $colorblue32$ words of length $4$ which contain the bad word $123$. These are $$(0..9,A..F)123qquadtext and qquad 123(0..9,A..F).$$
The coefficient of $s^n$
In fact we are interested in the coefficient of $s^64$. We calculate the coefficient of $s^n$ from (3). It is convenient to use the coefficient of operator $[s^n]$ to denote the coefficient of $s^n$ of a series.
We obtain from (3) for $ngeq 0$
beginalign*
colorblue[s^n]&colorblueleft(frac11-16s-frac11-16s+s^3right)\
&=[s^n]sum_m=0^infty 16^ms^m-[s^n]sum_m=0^infty s^m(16-s^2)^m\
&=16^n-[s^n]sum_m=0^infty s^msum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^n-m]sum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^m]sum_k=0^n-mbinomn-mk(-1)^ks^2k16^n-m-k\
&=16^n-sum_m=0^lfloor n/2 rfloor [s^2m]sum_k=0^n-2mbinomn-2mk(-1)^ks^2k16^n-2m-k\
&=16^n-sum_m=0^lfloor n/3 rfloorbinomn-2mm(-1)^m16^n-3m\
&,,colorblue=sum_m=1^lfloor n/3 rfloorbinomn-2mm(-1)^m+116^n-3mtag4\
endalign*
Finally we obtain from (4) in accordance with the answer of @antkam the wanted probability
beginalign*
colorblue[s^64]colorblueleft(frac11-16s-frac11-16s+s^3right)16^-64&colorblue=sum_m=1^21binom64-2mm(-1)^m+116^-3m\
&colorbluesimeq 0.01503,16662,40506
endalign*
which gives roughly $colorblue1.5%$.
$endgroup$
add a comment |
$begingroup$
This answer is based upon the Goulden-Jackson Cluster Method.
We consider the set words of length $ngeq 0$ built from the alphabet $mathcalV=0,ldots,9,A,ldots,F$ and the set $B=123$ of bad words, which are not allowed to be part of the words we are looking for.
We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of these words of length $n$.
Since we are looking for the number of words which contain the bad word $123$, the resulting generating function is the generating function of all words minus $f(s)$
beginalign*
&1+16s+16^2s^2+16^3s^3cdots-f(s)=frac11-16s-f(s)
endalign*
According to the paper (p.7) the generating function $f(s)$ is
beginalign*
f(s)=frac11-ds-textweight(mathcalC)tag1
endalign*
with $d=|mathcalV|=16$, the size of the alphabet and $mathcalC$ the weight-numerator of bad words with
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])
endalign*
We calculate according to the paper
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])&=-s^3tag2\
endalign*
It follows from (1) and (2)
beginalign*
f(s)=frac11-16s+s^3\
endalign*
and the generated function counting all strings which contain $123$ is
beginalign*
&colorbluefrac11-16s-frac11-16s+s^3=s^3 + colorblue32 s^4 + 768 s^5 + 16,383 s^6 +cdotstag3\
endalign*
The coefficients of the series were calculated with the help of Wolfram Alpha. We see for instance there are $colorblue32$ words of length $4$ which contain the bad word $123$. These are $$(0..9,A..F)123qquadtext and qquad 123(0..9,A..F).$$
The coefficient of $s^n$
In fact we are interested in the coefficient of $s^64$. We calculate the coefficient of $s^n$ from (3). It is convenient to use the coefficient of operator $[s^n]$ to denote the coefficient of $s^n$ of a series.
We obtain from (3) for $ngeq 0$
beginalign*
colorblue[s^n]&colorblueleft(frac11-16s-frac11-16s+s^3right)\
&=[s^n]sum_m=0^infty 16^ms^m-[s^n]sum_m=0^infty s^m(16-s^2)^m\
&=16^n-[s^n]sum_m=0^infty s^msum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^n-m]sum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^m]sum_k=0^n-mbinomn-mk(-1)^ks^2k16^n-m-k\
&=16^n-sum_m=0^lfloor n/2 rfloor [s^2m]sum_k=0^n-2mbinomn-2mk(-1)^ks^2k16^n-2m-k\
&=16^n-sum_m=0^lfloor n/3 rfloorbinomn-2mm(-1)^m16^n-3m\
&,,colorblue=sum_m=1^lfloor n/3 rfloorbinomn-2mm(-1)^m+116^n-3mtag4\
endalign*
Finally we obtain from (4) in accordance with the answer of @antkam the wanted probability
beginalign*
colorblue[s^64]colorblueleft(frac11-16s-frac11-16s+s^3right)16^-64&colorblue=sum_m=1^21binom64-2mm(-1)^m+116^-3m\
&colorbluesimeq 0.01503,16662,40506
endalign*
which gives roughly $colorblue1.5%$.
$endgroup$
This answer is based upon the Goulden-Jackson Cluster Method.
We consider the set words of length $ngeq 0$ built from the alphabet $mathcalV=0,ldots,9,A,ldots,F$ and the set $B=123$ of bad words, which are not allowed to be part of the words we are looking for.
We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of these words of length $n$.
Since we are looking for the number of words which contain the bad word $123$, the resulting generating function is the generating function of all words minus $f(s)$
beginalign*
&1+16s+16^2s^2+16^3s^3cdots-f(s)=frac11-16s-f(s)
endalign*
According to the paper (p.7) the generating function $f(s)$ is
beginalign*
f(s)=frac11-ds-textweight(mathcalC)tag1
endalign*
with $d=|mathcalV|=16$, the size of the alphabet and $mathcalC$ the weight-numerator of bad words with
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])
endalign*
We calculate according to the paper
beginalign*
textweight(mathcalC)=textweight(mathcalC[123])&=-s^3tag2\
endalign*
It follows from (1) and (2)
beginalign*
f(s)=frac11-16s+s^3\
endalign*
and the generated function counting all strings which contain $123$ is
beginalign*
&colorbluefrac11-16s-frac11-16s+s^3=s^3 + colorblue32 s^4 + 768 s^5 + 16,383 s^6 +cdotstag3\
endalign*
The coefficients of the series were calculated with the help of Wolfram Alpha. We see for instance there are $colorblue32$ words of length $4$ which contain the bad word $123$. These are $$(0..9,A..F)123qquadtext and qquad 123(0..9,A..F).$$
The coefficient of $s^n$
In fact we are interested in the coefficient of $s^64$. We calculate the coefficient of $s^n$ from (3). It is convenient to use the coefficient of operator $[s^n]$ to denote the coefficient of $s^n$ of a series.
We obtain from (3) for $ngeq 0$
beginalign*
colorblue[s^n]&colorblueleft(frac11-16s-frac11-16s+s^3right)\
&=[s^n]sum_m=0^infty 16^ms^m-[s^n]sum_m=0^infty s^m(16-s^2)^m\
&=16^n-[s^n]sum_m=0^infty s^msum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^n-m]sum_k=0^mbinommk(-1)^ks^2k16^m-k\
&=16^n-sum_m=0^n[s^m]sum_k=0^n-mbinomn-mk(-1)^ks^2k16^n-m-k\
&=16^n-sum_m=0^lfloor n/2 rfloor [s^2m]sum_k=0^n-2mbinomn-2mk(-1)^ks^2k16^n-2m-k\
&=16^n-sum_m=0^lfloor n/3 rfloorbinomn-2mm(-1)^m16^n-3m\
&,,colorblue=sum_m=1^lfloor n/3 rfloorbinomn-2mm(-1)^m+116^n-3mtag4\
endalign*
Finally we obtain from (4) in accordance with the answer of @antkam the wanted probability
beginalign*
colorblue[s^64]colorblueleft(frac11-16s-frac11-16s+s^3right)16^-64&colorblue=sum_m=1^21binom64-2mm(-1)^m+116^-3m\
&colorbluesimeq 0.01503,16662,40506
endalign*
which gives roughly $colorblue1.5%$.
edited Mar 27 at 18:14
answered Mar 27 at 18:04
Markus ScheuerMarkus Scheuer
63.5k460151
63.5k460151
add a comment |
add a comment |
$begingroup$
Just a quick hint: total number of all possible strings is $16^32$, the number of strings that contain at least one occurrence of "$123$" is $32 cdot 16^29=2cdot 16^30$ ($32$ spots to put substring "$123$" and we need to fill remaining $29$ positions). The probability for truly random generation should be $$frac2cdot 16^3016^32=frac1128$$
$endgroup$
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
add a comment |
$begingroup$
Just a quick hint: total number of all possible strings is $16^32$, the number of strings that contain at least one occurrence of "$123$" is $32 cdot 16^29=2cdot 16^30$ ($32$ spots to put substring "$123$" and we need to fill remaining $29$ positions). The probability for truly random generation should be $$frac2cdot 16^3016^32=frac1128$$
$endgroup$
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
add a comment |
$begingroup$
Just a quick hint: total number of all possible strings is $16^32$, the number of strings that contain at least one occurrence of "$123$" is $32 cdot 16^29=2cdot 16^30$ ($32$ spots to put substring "$123$" and we need to fill remaining $29$ positions). The probability for truly random generation should be $$frac2cdot 16^3016^32=frac1128$$
$endgroup$
Just a quick hint: total number of all possible strings is $16^32$, the number of strings that contain at least one occurrence of "$123$" is $32 cdot 16^29=2cdot 16^30$ ($32$ spots to put substring "$123$" and we need to fill remaining $29$ positions). The probability for truly random generation should be $$frac2cdot 16^3016^32=frac1128$$
answered Apr 20 '18 at 14:02
VasyaVasya
4,1471618
4,1471618
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
add a comment |
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
1
1
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
$begingroup$
the string length is 64, there are 62 places to put "123", and this argument double-counts the cases where "123" appears twice, etc.
$endgroup$
– antkam
Apr 20 '18 at 14:19
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2746006%2fprobability-of-a-3-digits-occurence-from-a-sequence-of-64-random-hexadecimal-sym%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Should'nt it be (A-F) and 16 symbols?
$endgroup$
– Bill O'Haran
Apr 20 '18 at 13:38
$begingroup$
You are absolutely right, and hex(32) actually returns 64 symbols too :)
$endgroup$
– Cyril Gandon
Apr 20 '18 at 13:42
1
$begingroup$
Not an answer, but if you are interested in an estimate, assuming each of the 62 3-digit sequences to be independent gives $1 - (1 - (1/16)^3)^62 approx 1.502%$.
$endgroup$
– antkam
Apr 20 '18 at 13:55