Notebook exit code 137. Cause and solution

Yes, I’ve been at it again with Microsoft Fabric, and as I’m trying to find the limits of this new cool toy, the limits sometimes get angry with me and throw an error. Most of the time this error is caused by me and I can usually figure out what’s happening, but not always.

Exit code without a real error

In this case, my notebook threw an error at me but the command seemed to finish without any issue. Sounds vague? It did to me. The notebookcell I tried to run had a lot of stuff happening at the same time.

A lot of work

As you can see in the above screenshot, the status shows green checkmarks but there’s an error as well. The error message was not really clear to me, but that can really be me lack of deep level experience. So, I logged a call with Microsoft Support and see what they could come up with.

More hardware!

Long story short, the issue can be seen in the first line of the second error. When a container runs out of memory. Well, who would have known that processing this amount of rows would lead to a lack of memory. And yes, I’ve created my workspace with a small standard pool. That wasn’t the best idea with the benefit of hindsight. On the other hand, it does give some good insights into what a standard small sparkpool can handle.

So the advice was to create a custom pool, large(r) than the current one and retry. So I created a Large pool and see what would happen. The start-up time of the pool was around three minutes, the process itself finished in 16 minutes. Not only quicker but also without the error message.

Why?

The support engineer was kind enough to provide links to the documentation. There you can find the following:

Node sizes

A Spark pool can be defined with node sizes that range from a small compute node (with 4 vCore and 32 GB of memory) to a large compute node (with 64 vCore and 512 GB of memory per node). Node sizes can be altered after pool creation, although the active session would have to be restarted.

SizevCoreMemory
Small432 GB
Medium864 GB
Large16128 GB
X-Large32256 GB
XX-Large64512 GB
Source: https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute

32 GB for a small cluster can be enough, but if it isn’t, you can scale up. But how do these cluster sizes compare to our Fabric capacity units? Because that’s where the key lies.

Every capacity unit has two vCores for a spark pool. If you have 2 CU’s, you can run one Small spark cluster of 4 cores. If you upgrade to 4 CU’s, you get 8 vCores and you can either run 1 medium spark cluster OR 2 small spark clusters. For each step you move up on the capacity unit ladder, you can change your configuration. For the F64, these are some options:

Fabric capacity SKUCapacity unitsSpark VCoresNode sizeMax number of nodes
F6464128Small32
F6464128Medium16
F6464128Large8
F6464128X-Large4
F6464128XX-Large2
source: https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute

Important to remember, pausing your Fabric environment pauses all these clusters too.

In any case, raising my issue with Microsoft support taught me some valuable lessons on both reading error messages and understanding the capacity unit definitions of Microsoft Fabric.

Thanks for reading!

One thought on “Notebook exit code 137. Cause and solution

Leave a comment